From mboxrd@z Thu Jan 1 00:00:00 1970 From: Linus Torvalds Subject: Re: [ANNOUNCE] Example Cogito Addon - cogito-bundle Date: Fri, 20 Oct 2006 12:46:39 -0700 (PDT) Message-ID: References: <9e4733910610140807p633f5660q49dd2d2111c9f5fe@mail.gmail.com> <200610201821.34712.jnareb@gmail.com> <20061020181210.GA29843@artax.karlin.mff.cuni.cz> <200610202047.11291.jnareb@gmail.com> <45391F1C.80100@utoronto.ca> Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Cc: Jakub Narebski , Jan Hudec , bazaar-ng@lists.canonical.com, git@vger.kernel.org X-From: git-owner@vger.kernel.org Fri Oct 20 21:47:23 2006 Return-path: Envelope-to: gcvg-git@gmane.org Received: from vger.kernel.org ([209.132.176.167]) by ciao.gmane.org with esmtp (Exim 4.43) id 1Gb0Kn-0007yy-K9 for gcvg-git@gmane.org; Fri, 20 Oct 2006 21:47:13 +0200 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2992722AbWJTTrJ (ORCPT ); Fri, 20 Oct 2006 15:47:09 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S2992727AbWJTTrJ (ORCPT ); Fri, 20 Oct 2006 15:47:09 -0400 Received: from smtp.osdl.org ([65.172.181.4]:29646 "EHLO smtp.osdl.org") by vger.kernel.org with ESMTP id S2992722AbWJTTrI (ORCPT ); Fri, 20 Oct 2006 15:47:08 -0400 Received: from shell0.pdx.osdl.net (fw.osdl.org [65.172.181.6]) by smtp.osdl.org (8.12.8/8.12.8) with ESMTP id k9KJkiaX002636 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Fri, 20 Oct 2006 12:46:45 -0700 Received: from localhost (shell0.pdx.osdl.net [10.9.0.31]) by shell0.pdx.osdl.net (8.13.1/8.11.6) with ESMTP id k9KJke1H032076; Fri, 20 Oct 2006 12:46:41 -0700 To: Aaron Bentley In-Reply-To: <45391F1C.80100@utoronto.ca> X-Spam-Status: No, hits=-0.976 required=5 tests=AWL,OSDL_HEADER_SUBJECT_BRACKETED X-Spam-Checker-Version: SpamAssassin 2.63-osdl_revision__1.95__ X-MIMEDefang-Filter: osdl$Revision: 1.155 $ X-Scanned-By: MIMEDefang 2.36 Sender: git-owner@vger.kernel.org Precedence: bulk X-Mailing-List: git@vger.kernel.org Archived-At: On Fri, 20 Oct 2006, Aaron Bentley wrote: > > Linus Torvalds wrote: > > Git goes one step further: it _really_ doesn't matter about how you got to > > a certain state. Absolutely _none_ of what the commits in between the > > final stages and the common ancestor matter in the least. The only thing > > that matters is what the states at the end-point are. > > That's interesting, because I've always thought one of the strengths of > file-ids was that you only had to worry about end-points, not how you > got there. > > How do you handle renames without looking at the history? You first handle all the non-renames that just merge on their own. That takes care of 99.99% of the stuff (and I'm not exaggerating: in the kernel, you have ~21000 files, and most merges don't have a single rename to worry about - and even when you do have them, they tend to be in the "you can count them on one hand" kind of situation). Then you just look at all the pathnames you _couldn't_ resolve, and that's usually cut down the thing to something where you can literally use a lot of CPU power per file, because now you only have a small number of candidates left. If you were to use one hundredth of a second per file regardless of file, a stupid per-file merge would take 210 seconds, which is just unacceptable. So you really don't want to do that. You want to merge whole subdirectories in one go (and with git, you can: since the SHA1 of a directory defines _all_ of the contents under it, if the two branches you merge have an identical subdirectory, you don't need to do anything at _all_ about that one. See?). So instead of trying to be really fast on individual files and doing them one at a time, git makes individual files basically totally free (you literally often don't need to look at them AT ALL). And then for the few files you can't resolve, you can afford to spend more time. So say that you spend one second per file-pair because you do complex heuristics etc - you'll still have a merge that is a _lot_ faster than your 210-second one. So recursive basically generates the matrix of similarity for the new/deleted files, and tries to match them up, and there you have your renames - without ever looking at the history of how you ended up where you are. Btw, that "210 second" merge is not at all unlikely. Some of the SCM's seem to scale much worse than that to big archives, and I've heard people talk about merges that took 20 minutes or more. In contrast, git doing a merge in ~2-3 seconds for the kernel is _normal_. [ In fact, I just re-tested doing my last kernel merge: it took 0.970 seconds, and that was _including_ the diffstat of the result - not obviously not including the time to fetch the other branch over the network. I don't know if people appreciate how good it is to do a merge of two 21000-file branches in less than a second. It didn't have any renames, and it only had a single well-defined common parent, but not only is that the common case, being that fast for the simple case is what _allows_ you to do well on the complex cases too, because it's what gets rid of all the files you should _not_ worry about ] Performance does matter. Linus