From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Zed A. Shaw" Subject: Introductions Date: Sat, 16 Apr 2005 12:50:15 -0400 Organization: ZedShaw.com Message-ID: <1113670215.6025.36.camel@thamachine> Reply-To: zedshaw@zedshaw.com Mime-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit X-From: git-owner@vger.kernel.org Sat Apr 16 18:48:54 2005 Return-path: Received: from vger.kernel.org ([12.107.209.244]) by ciao.gmane.org with esmtp (Exim 4.43) id 1DMqT0-0002l9-CM for gcvg-git@gmane.org; Sat, 16 Apr 2005 18:48:22 +0200 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S262703AbVDPQvt (ORCPT ); Sat, 16 Apr 2005 12:51:49 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S262702AbVDPQvt (ORCPT ); Sat, 16 Apr 2005 12:51:49 -0400 Received: from smtp04.mrf.mail.rcn.net ([207.172.4.63]:11930 "EHLO smtp04.mrf.mail.rcn.net") by vger.kernel.org with ESMTP id S262703AbVDPQvQ (ORCPT ); Sat, 16 Apr 2005 12:51:16 -0400 Received: from 207-237-211-198.c3-0.80w-ubr3.nyr-80w.ny.cable.rcn.com (HELO mythtv) (207.237.211.198) by smtp04.mrf.mail.rcn.net with ESMTP; 16 Apr 2005 12:51:16 -0400 X-IronPort-AV: i="3.92,106,1112587200"; d="scan'208"; a="23176828:sNHT20447004" To: git@vger.kernel.org X-Mailer: Evolution 2.2.1.1 Sender: git-owner@vger.kernel.org Precedence: bulk X-Mailing-List: git@vger.kernel.org Hi, Just a short message to introduce myself and give a shameless plug. I'm Zed A. Shaw and I'm the author of a little unknown SCM called FastCST (http://www.zedshaw.com/projects/fastcst ). While I doubt that Linus would ever adopt fastcst as his tool (and I probably wouldn't want him too since it's not quite ready for prime time) I did find many of the discussions on the list so far very interesting. Some sent me Linus' message about wanting to do a diff on the whole source tree, and just thought I'd mention that I already tried this in FastCST. FastCST uses a suffix array to construct a delta (not a diff), so I thought it might be possible to simply apply the delta algorithm to the entire source tree and get very small changesets. It worked on small source trees, but when it came to the Linux 2.6 tree it choked hard. Even with an efficient suffix array implementation, you're talking about performing a diff/delta on 225M of source. Added to the problem is that you have to track file locations within the massive blob. In the end, it also wasn't much more efficient from a size/space/time perspective so I dropped it. My current solution to Linus' problem is to use an inverted index to process all the sources and revisions on the fly as they are created. Using the inverted index, I'm able to VERY quickly find any chunk of source in files or revisions. This lets me track things like how functions move through the files, where chunks of code moved to, etc. In the end this turns out to be much more efficient (7 seconds on my computer to find all references to "sprintf" in the Linux 2.6 source) as I can use the super small deltas for distributing changes, and give developers a means tracking content changes across "the world" in a simple search format. Anyway, just thought I'd throw in my experiences attempting what Linus is talking about. I actually agree with him that rename tracking isn't that great, but I've come to the conclusion that tracking renames is actually a specific case of just a general search problem. Different strokes for different folks I guess. Other than that, I'm mostly interested in reading the messages and probably won't write anything unless people ask me directly for something. Thanks! Zed A. Shaw