git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Finding file revisions
@ 2005-04-27 16:50 Chris Mason
  2005-04-27 17:34 ` Linus Torvalds
  2005-04-28 16:08 ` Daniel Barkalow
  0 siblings, 2 replies; 25+ messages in thread
From: Chris Mason @ 2005-04-27 16:50 UTC (permalink / raw)
  To: git

Hello everyone,

I haven't seen a tool yet to find which changeset modified a given file, so 
I whipped up something.  The basic idea is to:

for each changeset in rev-list
	for each file in diff-tree -r parent changeset
		match against desired files

Is there a faster way?  This will scale pretty badly as the tree grows, but 
I usually only want to search back a few months in the history.  So, it 
might make sense to limit the results by date or commit/tag.

Usage:
file-changes [-c commit id] file1 ...

The file names can be perl regular expressions, and it will match any file 
starting with the expression listed.  So "file-changes fs/ext" will show 
everything in ext2 and ext3.

Example output:

diff-tree -r 56022b4d00cae3ff816d3ff05d9f8a80e1517c60 9bd104d712d710d53c35166e40bd5fe24caf893e
8a796b48e757e56b50802c28abf28e0199c45ad9->2db368df614de4799be2d1baffb6563dbe1b8926 fs/ext2/inode.c
dbc8fd9bab639b84b8cc94fdbbf850b1e4bf1b2b->a4cd819734ba2eea9d5d21039deca62057f72d44 fs/ext3/inode.c
cat-file commit 9bd104d712d710d53c35166e40bd5fe24caf893e
    tree cd4e40eae003e29c0d3be2aa769c3b572ab1b488
    parent 56022b4d00cae3ff816d3ff05d9f8a80e1517c60
    author mason <mason@coffee> 1114617717 -0400
    committer mason <mason@coffee> 1114617717 -0400

    comments go here

This is meant for cut n' paste.  If you find a changeset comment you like, 
run the diff-tree -r command on the first line to see a diff of the 
changeset (maybe I should add | diff-tree-helper here?)

-chris


#!/usr/bin/perl

use strict;

my $last;
my $ret;
my $i;
my @wanted = ();
my $matched;
my $argc = scalar(@ARGV);
my $commit;

sub print_usage() {
    print STDERR "usage: file-changes [-c commit] file_list\n";
    exit(1);
}

if ($argc < 1) {
    print_usage();
}

for ($i = 0 ; $i < $argc ; $i++)  {
    if ($ARGV[$i] eq "-c") {
    	if ($i == $argc - 1) {
	    print_usage();
	}
	$commit = $ARGV[++$i];
    } else {
	push @wanted, $ARGV[$i];
    }
}

if (!defined($commit)) {
    $commit = `commit-id`;
    if ($?) {
    	print STDERR "commit-id failed, try using -c to specify a commit\n";
	exit(1);
    }
    chomp $commit;
}

$last = $commit;

open(RL, "rev-list $commit|") || die "rev-list failed";
while(<RL>) {
    chomp;
    my $cur = $_;
    $matched = 0;
    if ($cur eq $last) {
        next;
    }
    # rev-list gives us the commits from newest to oldest
    open(DT, "diff-tree -r $cur $last|") || die "diff-tree failed";
    while(<DT>) {
        chomp;
	my @words = split;
	my $file = $words[3];
	# if the filename has whitespace, suck it in
	if (scalar(@words) > 4) {
	    if (m/$file(.*)/) {
	        $file .= $1;
	    }
	}
	foreach my $m (@wanted) {
	    if ($file =~ m/^$m/) {
		if (!$matched) {
		    print "diff-tree -r $cur $last\n";
		}
		print "$words[2] $file\n";
		$matched = 1;
	    }
	}
    }
    close(DT);
    if ($?) {
	$ret = $? >> 8;
	die "diff-tree failed with $ret";
    }
    if ($matched) {
	print "cat-file commit $last\n";
	open(COMMIT, "cat-file commit $last|") || die "cat-file $last failed";
	while(<COMMIT>) {
	    print "    $_";
	}
	close(COMMIT);
	if ($?) {
	    $ret = $? >> 8;
	    die "cat-file failed with $ret";
	}
	print "\n";
    }
    $last = $cur;
}

close(RL);
if ($?) {
    $ret = $? >> 8;
    die "rev-list failed with $ret";
}

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Finding file revisions
  2005-04-27 16:50 Finding file revisions Chris Mason
@ 2005-04-27 17:34 ` Linus Torvalds
  2005-04-27 18:23   ` Chris Mason
  2005-04-27 18:41   ` Thomas Gleixner
  2005-04-28 16:08 ` Daniel Barkalow
  1 sibling, 2 replies; 25+ messages in thread
From: Linus Torvalds @ 2005-04-27 17:34 UTC (permalink / raw)
  To: Chris Mason; +Cc: git



On Wed, 27 Apr 2005, Chris Mason wrote:
> 
> I haven't seen a tool yet to find which changeset modified a given file, so 
> I whipped up something.  The basic idea is to:
> 
> for each changeset in rev-list
> 	for each file in diff-tree -r parent changeset
> 		match against desired files
> 
> Is there a faster way? 

Yes. Tell "diff-tree" what your desired files are, and it will cut down 
the amount of work by a _lot_ (because then diff-tree doesn't need to 
recurse into subdirectories that don't matter).

So you should just do

	for each changeset in rev-list
	do 
		diff-tree -r parent changeset <file-list>
	...

instead. 

> This will scale pretty badly as the tree grows, but 
> I usually only want to search back a few months in the history.  So, it 
> might make sense to limit the results by date or commit/tag.

With more history, "rev-list" should do basically the right thing: it will
be constant-time for _recent_ commits, and it is linear time in how far
back you want to go. Which seems quite reasonable.

And diff-tree is obviously constant-time (and very fast at that, 
especially if you limit it to just a few files, since then it won't even 
bother with any other subdirectories).

		Linus

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Finding file revisions
  2005-04-27 17:34 ` Linus Torvalds
@ 2005-04-27 18:23   ` Chris Mason
  2005-04-27 22:19     ` Linus Torvalds
  2005-04-28 13:01     ` David Woodhouse
  2005-04-27 18:41   ` Thomas Gleixner
  1 sibling, 2 replies; 25+ messages in thread
From: Chris Mason @ 2005-04-27 18:23 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1722 bytes --]

On Wednesday 27 April 2005 13:34, Linus Torvalds wrote:
> On Wed, 27 Apr 2005, Chris Mason wrote:
> > Is there a faster way?
>
> Yes. Tell "diff-tree" what your desired files are, and it will cut down
> the amount of work by a _lot_ (because then diff-tree doesn't need to
> recurse into subdirectories that don't matter).

Thanks.  I originally called diff-tree without the file list so that I could 
do the regexp matching, but this is probably one of those features that will 
never get used.

My test case here is a tree with 400 commits, giving diff-tree the file list 
brings us down from 16s to 9s on a cold cache.  Hot cache is about 1.5 
seconds on both.

>
> > This will scale pretty badly as the tree grows, but
> > I usually only want to search back a few months in the history.  So, it
> > might make sense to limit the results by date or commit/tag.
>
> With more history, "rev-list" should do basically the right thing: it will
> be constant-time for _recent_ commits, and it is linear time in how far
> back you want to go. Which seems quite reasonable.
>
> And diff-tree is obviously constant-time (and very fast at that,
> especially if you limit it to just a few files, since then it won't even
> bother with any other subdirectories).

Usually the question I will want to ask is "how did foo.c change since tag X", 
which usually won't go back more then a few months.   This should be 
reasonable, and I'd rather not slow down common operations adding extra 
indexing for the uncommon file-changes run.

So, new prog attached.  New usage:

file-changes [-c commit_id] [-s commit_id] file ...

-c is the commit where you want to start searching
-s is the commit where you want to stop searching

-chris

[-- Attachment #2: file-changes --]
[-- Type: application/x-perl, Size: 2027 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Finding file revisions
  2005-04-27 17:34 ` Linus Torvalds
  2005-04-27 18:23   ` Chris Mason
@ 2005-04-27 18:41   ` Thomas Gleixner
  2005-04-28 15:24     ` Linus Torvalds
  1 sibling, 1 reply; 25+ messages in thread
From: Thomas Gleixner @ 2005-04-27 18:41 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Chris Mason, git

On Wed, 2005-04-27 at 10:34 -0700, Linus Torvalds wrote:

> > This will scale pretty badly as the tree grows, but 
> > I usually only want to search back a few months in the history.  So, it 
> > might make sense to limit the results by date or commit/tag.
> 
> With more history, "rev-list" should do basically the right thing: it will
> be constant-time for _recent_ commits, and it is linear time in how far
> back you want to go. Which seems quite reasonable.

Which is quite horrible, if you have a 500k+ blobs repo.

I know you are database allergic, but there a database is the correct
solution. Having stored all the relations of those file/tree/commit
blobs in a database it takes <20ms to have a list of all those file
blobs in historical order with some context information retrieved. Thats
not on a monster machine, its on an ordinary wallmart pc

tglx



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Finding file revisions
  2005-04-27 18:23   ` Chris Mason
@ 2005-04-27 22:19     ` Linus Torvalds
  2005-04-27 22:31       ` Chris Mason
                         ` (2 more replies)
  2005-04-28 13:01     ` David Woodhouse
  1 sibling, 3 replies; 25+ messages in thread
From: Linus Torvalds @ 2005-04-27 22:19 UTC (permalink / raw)
  To: Chris Mason; +Cc: git



On Wed, 27 Apr 2005, Chris Mason wrote:
> 
> So, new prog attached.  New usage:
> 
> file-changes [-c commit_id] [-s commit_id] file ...
> 
> -c is the commit where you want to start searching
> -s is the commit where you want to stop searching

Your script will do some funky stuff, because you incorrectly think that
the rev-list is sorted linearly. It's not. It's sorted in a rough
chronological order, but you really can't do the "last" vs "cur" thing
that you do, because two commits after each other in the rev-list listing
may well be from two totally different branches, so when you compare one
tree against the other, you're really doing something pretty nonsensical.

diff-tree will happily compare trees that aren't related, so it will 
"work" in a sense, but it doesn't actually do what you think it does ;)

So what you should do is basically something like

	open(RL, "rev-list $commit|") || die "rev-list failed";
	while(<RL>) {
		chomp;
		my $cur = $_;

(so far so good) but then you should look at the _parents_ of that 
commit, ie do (NOTE NOTE NOTE! I'm a total perl idiot, so I'm not going to 
do this right):

		open(PARENT, "cat-file commit $cur") || die "cat-file failed");
		while(<PARENT>) {
			chomp;
			my @words = split;
			if ($words[1] == "tree")
				continue;
			if ($words[1] != "parent")
				break;
			test_diff($cur, $words[2]);
		}
		close(PARENT);
	}
	close(RL);

and now your "test_diff()" thing can do the tree diff.

That way you actually do "tree-diff" on the thing you should do, and it 
will show you _which_ way it changed in a merge (ie if you hit a 
merge-point, it will do a tree-diff against both parents, and show you 
which one had the difference - then you'll obviously usually see that same 
difference later on when you dig down to the actual changeset that did it 
too).

Remember: time is not a nice linear stream.

		Linus

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Finding file revisions
  2005-04-27 22:19     ` Linus Torvalds
@ 2005-04-27 22:31       ` Chris Mason
  2005-04-28  8:41         ` Simon Fowler
  2005-04-28 11:45       ` Chris Mason
  2005-04-28 13:09       ` David Woodhouse
  2 siblings, 1 reply; 25+ messages in thread
From: Chris Mason @ 2005-04-27 22:31 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

On Wednesday 27 April 2005 18:19, Linus Torvalds wrote:
> On Wed, 27 Apr 2005, Chris Mason wrote:
> > So, new prog attached.  New usage:
> >
> > file-changes [-c commit_id] [-s commit_id] file ...
> >
> > -c is the commit where you want to start searching
> > -s is the commit where you want to stop searching
>
> Your script will do some funky stuff, because you incorrectly think that
> the rev-list is sorted linearly. It's not. It's sorted in a rough
> chronological order, but you really can't do the "last" vs "cur" thing
> that you do, because two commits after each other in the rev-list listing
> may well be from two totally different branches, so when you compare one
> tree against the other, you're really doing something pretty nonsensical.

Aha, didn't realize that one.  Thanks, I'll rework things here.

-chris

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Finding file revisions
  2005-04-27 22:31       ` Chris Mason
@ 2005-04-28  8:41         ` Simon Fowler
  2005-04-28 11:56           ` Chris Mason
  0 siblings, 1 reply; 25+ messages in thread
From: Simon Fowler @ 2005-04-28  8:41 UTC (permalink / raw)
  To: Chris Mason; +Cc: Linus Torvalds, git


[-- Attachment #1.1: Type: text/plain, Size: 1826 bytes --]

On Wed, Apr 27, 2005 at 06:31:47PM -0400, Chris Mason wrote:
> On Wednesday 27 April 2005 18:19, Linus Torvalds wrote:
> > On Wed, 27 Apr 2005, Chris Mason wrote:
> > > So, new prog attached.  New usage:
> > >
> > > file-changes [-c commit_id] [-s commit_id] file ...
> > >
> > > -c is the commit where you want to start searching
> > > -s is the commit where you want to stop searching
> >
> > Your script will do some funky stuff, because you incorrectly think that
> > the rev-list is sorted linearly. It's not. It's sorted in a rough
> > chronological order, but you really can't do the "last" vs "cur" thing
> > that you do, because two commits after each other in the rev-list listing
> > may well be from two totally different branches, so when you compare one
> > tree against the other, you're really doing something pretty nonsensical.
> 
> Aha, didn't realize that one.  Thanks, I'll rework things here.
> 
I've got a version of this written in C that I've been working on
for a bit - some example output:

+040000 tree    bfb75011c32589b282dd9c86621dadb0f0bb3866        ppc
+100644 blob    5ba4fc5259b063dab6417c142938d987ee894fc0        ppc/sha1.c
+100644 blob    c3c51aa4d487f2e85c02b0257c1f0b57d6158d76        ppc/sha1.h
+100644 blob    e85611a4ef0598f45911357d0d2f1fc354039de4        ppc/sha1ppc.S
commit b5af9107270171b79d46b099ee0b198e653f3a24->a6ef3518f9ac8a1c46a36c8d27173b1f73d839c4

You run it as:
find-changes commit_id file_prefix ...

The file_prefix is a path prefix to match - it's not as flexible as
regexes, but it shouldn't be too much less useful.

Simon

-- 
PGP public key Id 0x144A991C, or http://himi.org/stuff/himi.asc
(crappy) Homepage: http://himi.org
doe #237 (see http://www.lemuria.org/DeCSS) 
My DeCSS mirror: ftp://himi.org/pub/mirrors/css/ 

[-- Attachment #1.2: find-changes.diff --]
[-- Type: text/plain, Size: 8905 bytes --]

Find commits that changed files matching the prefix given on the command line.

Signed-off-by: Simon Fowler <simon@dreamcraft.com.au>
---

Index: Makefile
===================================================================
--- c3aa1e6b53cc59d5fbe261f3f859584904ae3a63/Makefile  (mode:100644 sha1:d73bea1cbb9451a89b03d6066bf2ed7fec32fd31)
+++ uncommitted/Makefile  (mode:100644)
@@ -38,7 +38,7 @@
 	cat-file fsck-cache checkout-cache diff-tree rev-tree show-files \
 	check-files ls-tree merge-base merge-cache unpack-file git-export \
 	diff-cache convert-cache http-pull rpush rpull rev-list git-mktag \
-	diff-tree-helper
+	diff-tree-helper find-changes
 
 SCRIPT=	commit-id tree-id parent-id cg-Xdiffdo cg-Xmergefile \
 	cg-add cg-admin-lsobj cg-cancel cg-clone cg-commit cg-diff \
Index: find-changes.c
===================================================================
--- /dev/null  (tree:c3aa1e6b53cc59d5fbe261f3f859584904ae3a63)
+++ uncommitted/find-changes.c  (mode:100644 sha1:64c0c3627d84969ee1596b05f97705455fba1871)
@@ -0,0 +1,279 @@
+/*
+ * find-changes.c - find the commits that changed a particular file.
+ */
+
+#include "cache.h"
+//#include "revision.h"
+#include "commit.h"
+#include <sys/param.h>
+
+/* 
+ * This is a simple tool that walks through the revisions cache and
+ * checks the parent-child diffs to see if they include the given
+ * filename. 
+ */
+
+static int recursive = 1;
+static int found = 0;
+
+static char *malloc_base(const char *base, const char *path, int pathlen)
+{
+	int baselen = strlen(base);
+	char *newbase = malloc(baselen + pathlen + 2);
+	memcpy(newbase, base, baselen);
+	memcpy(newbase + baselen, path, pathlen);
+	memcpy(newbase + baselen + pathlen, "/", 2);
+	return newbase;
+}
+
+static void update_tree_entry(void **bufp, unsigned long *sizep)
+{
+	void *buf = *bufp;
+	unsigned long size = *sizep;
+	int len = strlen(buf) + 1 + 20;
+
+	if (size < len)
+		die("corrupt tree file");
+	*bufp = buf + len;
+	*sizep = size - len;
+}
+
+static const unsigned char *extract(void *tree, unsigned long size, const char **pathp, unsigned int *modep)
+{
+	int len = strlen(tree)+1;
+	const unsigned char *sha1 = tree + len;
+	const char *path = strchr(tree, ' ');
+
+	if (!path || size < len + 20 || sscanf(tree, "%o", modep) != 1)
+		die("corrupt tree file");
+	*pathp = path+1;
+	return sha1;
+}
+
+static int check_file(void *tree, unsigned long size, const char *base, const char *target);
+
+/* A whole sub-tree went away or appeared */
+static int check_tree(void *tree, unsigned long size, const char *base, const char *target)
+{
+	int retval = 0;
+
+	while (size && !retval) {
+		retval = check_file(tree, size, base, target);
+		update_tree_entry(&tree, &size);
+	}
+	return retval;
+}
+
+/* A file entry went away or appeared.
+ * Check the entire subtree under this, and long_jmp() back to the parse_diffs()
+ * function if we find the target. */
+static int check_file(void *tree, unsigned long size, const char *base, const char *target)
+{
+	unsigned mode;
+	const char *path;
+	char full_path[MAXPATHLEN + 1];
+	int pathlen, retval;
+	const unsigned char *sha1 = extract(tree, size, &path, &mode);
+
+	pathlen = snprintf(full_path, MAXPATHLEN, "%s%s", base, path);
+	if (!cache_name_compare(full_path, pathlen, target, strlen(target)))
+		found = 1;
+
+	if (recursive && S_ISDIR(mode)) {
+		char type[20];
+		unsigned long size;
+		char *newbase = malloc_base(base, path, strlen(path));
+		void *tree;
+
+		tree = read_sha1_file(sha1, type, &size);
+		if (!tree || strcmp(type, "tree"))
+			die("corrupt tree sha %s", sha1_to_hex(sha1));
+
+		retval = check_tree(tree, size, newbase, target);
+		
+		free(tree);
+		free(newbase);
+		return retval;
+	}
+	return 0;
+}
+	
+static int diff_tree_sha1(const unsigned char *old, const unsigned char *new, const char *base, const char *target);
+
+/* the diff-tree algorithm depends on compare_tree_entry returning basically
+ * the same thing that memcmp() would on the filenames - this is important
+ * because the directories are sorted, and hence you need to decide what */
+static int compare_tree_entry(void *tree1, unsigned long size1, 
+			      void *tree2, unsigned long size2, 
+			      const char *base, const char *target)
+{
+	unsigned mode1, mode2;
+	const char *path1, *path2;
+	const unsigned char *sha1, *sha2;
+	int cmp, pathlen1, pathlen2;
+
+	if (found)
+		return 0;
+
+	sha1 = extract(tree1, size1, &path1, &mode1);
+	sha2 = extract(tree2, size2, &path2, &mode2);
+
+	pathlen1 = strlen(path1);
+	pathlen2 = strlen(path2);
+	cmp = cache_name_compare(path1, pathlen1, path2, pathlen2);
+	/* these files are different - if this is a directory then the
+	 * contents of the subtree are all different. So, we need to
+	 * run over the subtree and see if our target is in there
+	 * . . . */
+	if (cmp) {
+		check_file(tree1, size1, base, target);
+		check_file(tree2, size2, base, target);
+		return cmp;
+	}
+
+	if (!memcmp(sha1, sha2, 20) && mode1 == mode2)
+		return 0;
+
+	/*
+	 * If the filemode has changed to/from a directory from/to a regular
+	 * file, we need to consider it a remove and an add.
+	 */
+	if (S_ISDIR(mode1) != S_ISDIR(mode2)) {
+		check_file(tree1, size1, base, target);
+		check_file(tree2, size2, base, target);
+		return 0;
+	}
+
+	if (recursive && S_ISDIR(mode1)) {
+		int retval;
+		char *newbase = malloc_base(base, path1, pathlen1);
+		retval = diff_tree_sha1(sha1, sha2, newbase, target);
+		free(newbase);
+		return retval;
+	}
+	
+	check_file(tree1, size1, base, target);
+	check_file(tree2, size2, base, target);
+	return 0;
+}
+
+static int diff_tree(void *tree1, unsigned long size1, void *tree2, unsigned long size2, 
+		     const char *base, const char *target)
+{
+	while (size1 | size2) {
+		if (!size1) {
+			check_file(tree2, size2, base, target);
+			update_tree_entry(&tree2, &size2);
+			continue;
+		}
+		if (!size2) {
+			check_file(tree1, size1, base, target);
+			update_tree_entry(&tree1, &size1);
+			continue;
+		}
+		switch (compare_tree_entry(tree1, size1, tree2, size2, base, target)) {
+		case -1:
+			update_tree_entry(&tree1, &size1);
+			continue;
+		case 0:
+			update_tree_entry(&tree1, &size1);
+			/* Fallthrough */
+		case 1:
+			update_tree_entry(&tree2, &size2);
+			continue;
+		}
+		die("diff-tree: internal error");
+	}
+	return 0;
+}
+
+static int diff_tree_sha1(const unsigned char *old, const unsigned char *new, const char *base,
+			  const char *target)
+{
+	void *tree1, *tree2;
+	unsigned long size1, size2;
+	char type[20];
+	int retval;
+
+	tree1 = read_sha1_file(old, type, &size1);
+	if (!tree1 || strcmp(type, "tree"))
+		die("unable to read source tree %s", sha1_to_hex(old));
+	tree2 = read_sha1_file(new, type, &size2);
+	if (!tree2 || strcmp(type, "tree"))
+		die("unable to read destination tree %s", sha1_to_hex(new));
+	retval = diff_tree(tree1, size1, tree2, size2, base, target);
+	free(tree1);
+	free(tree2);
+	return retval;
+}
+
+static int process_diffs(struct commit *parent, struct commit *commit, const char *target)
+{
+	found = 0;
+	diff_tree_sha1(parent->tree->object.sha1, commit->tree->object.sha1, "", target);
+	if (found)
+		printf("%s\n", sha1_to_hex(commit->object.sha1));
+	return 0;
+}
+
+/*
+ * Walk the set of parents, and collect a list of the objects. 
+ */
+void process_commit(struct commit *item)
+{
+	struct commit_list *parents;
+
+	if (parse_commit(item))
+		die("unable to parse commit %s", sha1_to_hex(item->object.sha1));
+	
+	parents = item->parents;
+	while (parents) {
+		process_commit(parents->item);
+		parents = parents->next;
+	}
+}
+
+/*
+ * Usage: find-changes <parent-id> <filename>
+ *
+ * Note that this code will find the commits that change the given
+ * file in the set of commits that are parents of the one given on the
+ * command line.
+ */ 
+
+int main(int argc, char **argv)
+{
+	int i;
+	char sha1[20];
+	struct commit *orig;
+
+	if (argc != 3) 
+		usage("find-changes <parent-id> <filename>");
+		
+	get_sha1_hex(argv[1], sha1);
+	orig = lookup_commit(sha1);
+	process_commit(orig);
+	mark_reachable(&lookup_commit(argv[1])->object, 1);
+
+	/* this code needs to use tree.c to do most of the work - this
+	 * will simplify things a lot. 
+	 * XXX: rewrite diff-tree.c to do the same. */
+	
+	for (i = 0; i < nr_objs; i++) {
+		struct object *obj = objs[i];
+		struct commit *commit;
+		struct commit_list *p;
+
+		if (obj->type != commit_type)
+			continue;
+
+		commit = (struct commit *) obj;
+
+		p = commit->parents;
+		while (p) {
+			process_diffs(p->item, commit, argv[2]);
+			p = p->next;
+		}
+	}
+	return 0;
+}

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Finding file revisions
  2005-04-27 22:19     ` Linus Torvalds
  2005-04-27 22:31       ` Chris Mason
@ 2005-04-28 11:45       ` Chris Mason
  2005-04-28 16:34         ` Kay Sievers
  2005-04-28 19:11         ` Kay Sievers
  2005-04-28 13:09       ` David Woodhouse
  2 siblings, 2 replies; 25+ messages in thread
From: Chris Mason @ 2005-04-28 11:45 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1884 bytes --]

On Wednesday 27 April 2005 18:19, Linus Torvalds wrote:
> On Wed, 27 Apr 2005, Chris Mason wrote:
> > So, new prog attached.  New usage:
> >
> > file-changes [-c commit_id] [-s commit_id] file ...
> >
> > -c is the commit where you want to start searching
> > -s is the commit where you want to stop searching
>
> Your script will do some funky stuff, because you incorrectly think that
> the rev-list is sorted linearly. It's not. It's sorted in a rough
> chronological order, but you really can't do the "last" vs "cur" thing
> that you do, because two commits after each other in the rev-list listing
> may well be from two totally different branches, so when you compare one
> tree against the other, you're really doing something pretty nonsensical.

One more rev that should work as you suggested Here's the example output 
from a cogito changeset with merges.  I print the diff-tree lines once for each 
matching parent and then print the commit once.  It's very primitive, but
hopefully some day someone will make a gui with happy clicky buttons
for changesets and filerevs.

diff-tree -r 2544d7558f0ce94ab9c163f5b67244f71d8c85b8 69eeae031bf5447e99b9274761e2361e8c5a944e
618fdb616cebbd2fc9f1cddc0b6b75fd575250a1->3579b5fd1182679a39b83eaaa9dd0e7c970f4545 diff-tree.c
diff-tree -r 9831d8f86095edde393e495d7a55cab9d35d5d05 69eeae031bf5447e99b9274761e2361e8c5a944e
2d2913b6b98ac836b43755b1304d2a838dad87dd->4f01bbbbb3fd0e53e9ce968f167b6dae68fcfa92 Makefile
cat-file commit 69eeae031bf5447e99b9274761e2361e8c5a944e
    tree 7510dc1b63e9e690ec73952e40a31e43af4b55bc
    parent 2544d7558f0ce94ab9c163f5b67244f71d8c85b8
    parent 9831d8f86095edde393e495d7a55cab9d35d5d05
    author Petr Baudis <pasky@ucw.cz> 1114544917 +0200
    committer Petr Baudis <xpasky@machine.sinus.cz> 1114544917 +0200

    Merge with rsync://www.kernel.org/pub/linux/kernel/people/torvalds/git.git

-chris

[-- Attachment #2: file-changes --]
[-- Type: application/x-perl, Size: 2385 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Finding file revisions
  2005-04-28  8:41         ` Simon Fowler
@ 2005-04-28 11:56           ` Chris Mason
  2005-04-28 13:13             ` Simon Fowler
  0 siblings, 1 reply; 25+ messages in thread
From: Chris Mason @ 2005-04-28 11:56 UTC (permalink / raw)
  To: simon; +Cc: Linus Torvalds, git

On Thursday 28 April 2005 04:41, Simon Fowler wrote:
> I've got a version of this written in C that I've been working on
> for a bit - some example output:
>
> +040000 tree    bfb75011c32589b282dd9c86621dadb0f0bb3866        ppc
> +100644 blob    5ba4fc5259b063dab6417c142938d987ee894fc0        ppc/sha1.c
> +100644 blob    c3c51aa4d487f2e85c02b0257c1f0b57d6158d76        ppc/sha1.h
> +100644 blob    e85611a4ef0598f45911357d0d2f1fc354039de4       
> ppc/sha1ppc.S commit
> b5af9107270171b79d46b099ee0b198e653f3a24->a6ef3518f9ac8a1c46a36c8d27173b1f7
>3d839c4
>
> You run it as:
> find-changes commit_id file_prefix ...
>
> The file_prefix is a path prefix to match - it's not as flexible as
> regexes, but it shouldn't be too much less useful.

I dropped the regexes for speed with diff-tree, they weren't that important to 
me...The features I was going for are:

1) ability to see the changeset comments in the output.
2) ability to look for revs on more than one file at a time.  The single file 
limit in bk revtool always bugged me.
3) Some quick cut n' paste method to generate the changeset diff.  This is why 
I do diff-tree -r in the output, so I can just copy into a different window 
and go.

Your c version would hopefully end up faster on cpu time by limiting the 
number of times we read/decompress the commit files.

-chris

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Finding file revisions
  2005-04-27 18:23   ` Chris Mason
  2005-04-27 22:19     ` Linus Torvalds
@ 2005-04-28 13:01     ` David Woodhouse
  1 sibling, 0 replies; 25+ messages in thread
From: David Woodhouse @ 2005-04-28 13:01 UTC (permalink / raw)
  To: Chris Mason; +Cc: Linus Torvalds, git

[-- Attachment #1: Type: text/plain, Size: 2865 bytes --]

On Wed, 2005-04-27 at 14:23 -0400, Chris Mason wrote:
> Thanks.  I originally called diff-tree without the file list so that I could 
> do the regexp matching, but this is probably one of those features that will 
> never get used.

When I added this functionality to diff-tree I didn't want to add regexp
support, but I did make sure it could handle the simple case of "changes
within directory xxx/yyy". It can also take _multiple_ names. 

At the same time, I also posted a primitive script which attempted to do
something similar to what you're doing. The output of rev-tree is
useless, as Linus pointed out. Chronological sorting is
counterproductive in all cases and should be avoided _everywhere_.

My script is based on the original 'gitlog.sh' script, which walks the
commit tree from the head to its parents. It lists only those commits
where the file(s) in question actually changed, giving the commit ID and
the changes.

There's one problem with that already documented in my (attached) mail
-- we don't print merge changesets where the file in the child is
identical to the file in all the parents, but the changeset in question
_is_ relevant to the history because it's merging two branches on which
the file _independently_ changed.

The other problem is that we still don't have enough information to
piece together the full tree. With each commit we print, we're also
printing the last _relevant_ child (see $lastprinted in the script). 

That allows us to piece together most of the graph, but when we
eventually reach a commit which has already been processed (but not
necessarily _printed_, we just stop -- so we don't have useful parent
information for the oldset change in each branch and can't tie it back
to the point at which it branched. We know the _immediate_ parent, but
that parent isn't necessarily going to have been one of the commits we
actually printed.

I suspect the best way to do this is to start with a copy of rev-tree
and do something like..

	1. Add a 'struct commit_list children' to 'struct commit'

	2. Make process_commit() set it correctly:
@@ wherever @@ process_commit
	        while (parents) {
	                process_commit(parents->item->object.sha1);
+	                commit_list_insert(obj, &parents->item->children);
	                parents = parents->next;
	        }

	3. Check each 'interesting' commit to see if it affects the
	   file(s) in question.
	   
	4. Prune the tree: For each commit which isn't a merge and which
	   doesn't touch the file(s), just dump it from the tree,
	   changing the child pointer of its parent and the parent
	   pointer of its child accordingly to maintain the tree.
	   For each merge where there are no changes to the file(s)
	   between the merge point and the point at which the branch was
	   taken, drop that too.

	5. Print the remaining commits.


-- 
dwmw2

[-- Attachment #2: Attached message - Re: [GIT PATCH] Selective diff-tree --]
[-- Type: message/rfc822, Size: 7296 bytes --]

[-- Attachment #2.1.1: Type: text/plain, Size: 2904 bytes --]

On Wed, 2005-04-13 at 14:57 +0100, David Woodhouse wrote:
> The plan is that this will also form the basis of a tool which will report the
> revision tree for a given file, which is why I really want to avoid the
> unnecessary recursion rather than just post-processing the output.

Script attached. Its output is something like this:

commit 97c9a63e76bf667c21f24a5cfa8172aff0dd1294 child
*100664->100644 blob    6e4064e920792d5b0219b9f8f55a38ab4a1af856->c1091cd15e2ed1be65b50eaa910f7b45c08d93ac      rev-tree.c

--------------------------
commit 13b6f29ac1686955e15f0250f796362460b4992e child 97c9a63e76bf667c21f24a5cfa8172aff0dd1294
*100644->100644 blob    5b3090780d49cc610339a19f070a5954dce9a8bc->c1091cd15e2ed1be65b50eaa910f7b45c08d93ac      rev-tree.c

--------------------------
commit 6420f0732f695269c0e3f28e62ed4b9aa6578d9f child 13b6f29ac1686955e15f0250f796362460b4992e
*100644->100644 blob    7429b9c4d0aab2e4a494eb4b65129a59da138106->5b3090780d49cc610339a19f070a5954dce9a8bc      rev-tree.c
*100664->100644 blob    28a980482bf2053e022409cc3e50b2ad8adafd55->5b3090780d49cc610339a19f070a5954dce9a8bc      rev-tree.c

 <...>

As we walk the tree from the HEAD to its parents, we print only those
commits which modify the file(s) in question. We remember the last
commit we printed as we recurse, so that we can generate a complete
graph. The SHA-1 of the blobs themselves aren't good enough on their own
because they're not guaranteed to be unique -- if the same change
happens on two different branches, the SHA-1 will be the same, and we
won't know how it fits together.

As it is, it's not quite perfect because I'm still omitting merge
commits where the resulting file is identical to the same file in _all_
of the parents. So if we have the following tree (for the _file):

       ----- (AB) ----,
      /                \ 
  (A) ------ (AB) ----- (AB) --,
      \                         \
       ----- (AC) --------------(ABC)

(Where the delta A->AB is a trivial one-line fix which two people
independently reproduce, then they merge their trees together)

.. the point where the two independent instances of (AB) are merged
together won't be shown in the output of the attached script. The output
would show only this:

       ----- (AB) ----,
      /                \ 
  (A) ------ (AB) ----- (ABC)
      \                /           
       ----- (AC) ----'

Do we care about this? Or is it good enough? I don't really want to emit
output for _every_ merge commit we traverse, just in _case_ it happens
to be relevant later. Should just give in to the voices in my head which
are telling me I should through the damn thing away and rewrite it in C?

Given this output, it should be possible to display a pretty graph of
the history of the file, and easily find both diffs and whole files.
Creating a graphical tool which does this is left as an exercise for the
reader.
 
-- 
dwmw2

[-- Attachment #2.1.2: gitfilelog.sh --]
[-- Type: application/x-shellscript, Size: 1983 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Finding file revisions
  2005-04-27 22:19     ` Linus Torvalds
  2005-04-27 22:31       ` Chris Mason
  2005-04-28 11:45       ` Chris Mason
@ 2005-04-28 13:09       ` David Woodhouse
  2 siblings, 0 replies; 25+ messages in thread
From: David Woodhouse @ 2005-04-28 13:09 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Chris Mason, git

On Wed, 2005-04-27 at 15:19 -0700, Linus Torvalds wrote:
> Remember: time is not a nice linear stream.

Time is neither nice nor linear. Time is a complete illusion.

If _any_ of your tools are using the time for _any_ purpose other than
to display it to the user along with the author/committer information,
then you are probably making a mistake. 

Relative time does not represent the revision history of a distributed
system which supports merges. Any correlation you think you see is
_purely_ a coincidence.

-- 
dwmw2


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Finding file revisions
  2005-04-28 11:56           ` Chris Mason
@ 2005-04-28 13:13             ` Simon Fowler
  0 siblings, 0 replies; 25+ messages in thread
From: Simon Fowler @ 2005-04-28 13:13 UTC (permalink / raw)
  To: Chris Mason; +Cc: Linus Torvalds, git

[-- Attachment #1: Type: text/plain, Size: 1772 bytes --]

On Thu, Apr 28, 2005 at 07:56:57AM -0400, Chris Mason wrote:
> On Thursday 28 April 2005 04:41, Simon Fowler wrote:
> > I've got a version of this written in C that I've been working on
> > for a bit - some example output:
> >
> > +040000 tree    bfb75011c32589b282dd9c86621dadb0f0bb3866        ppc
> > +100644 blob    5ba4fc5259b063dab6417c142938d987ee894fc0        ppc/sha1.c
> > +100644 blob    c3c51aa4d487f2e85c02b0257c1f0b57d6158d76        ppc/sha1.h
> > +100644 blob    e85611a4ef0598f45911357d0d2f1fc354039de4       
> > ppc/sha1ppc.S commit
> > b5af9107270171b79d46b099ee0b198e653f3a24->a6ef3518f9ac8a1c46a36c8d27173b1f7
> >3d839c4
> >
> > You run it as:
> > find-changes commit_id file_prefix ...
> >
> > The file_prefix is a path prefix to match - it's not as flexible as
> > regexes, but it shouldn't be too much less useful.
> 
> I dropped the regexes for speed with diff-tree, they weren't that important to 
> me...The features I was going for are:
> 
> 1) ability to see the changeset comments in the output.

I'll add a -v option tomorrow (it's 11pm here) to show the commit
comments in the output, if you like the idea.

> 3) Some quick cut n' paste method to generate the changeset diff.  This is why 
> I do diff-tree -r in the output, so I can just copy into a different window 
> and go.
> 
Would the two commit sha1s space seperated, rather than with the
'->', be better for that? I'm a little reluctant to have it output
the full 'diff-tree -r' thing, since it's inconsistent with the
other tools. 

Simon

-- 
PGP public key Id 0x144A991C, or http://himi.org/stuff/himi.asc
(crappy) Homepage: http://himi.org
doe #237 (see http://www.lemuria.org/DeCSS) 
My DeCSS mirror: ftp://himi.org/pub/mirrors/css/ 

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Finding file revisions
  2005-04-27 18:41   ` Thomas Gleixner
@ 2005-04-28 15:24     ` Linus Torvalds
  2005-04-28 16:47       ` Thomas Gleixner
  0 siblings, 1 reply; 25+ messages in thread
From: Linus Torvalds @ 2005-04-28 15:24 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Chris Mason, git



On Wed, 27 Apr 2005, Thomas Gleixner wrote:
>
> On Wed, 2005-04-27 at 10:34 -0700, Linus Torvalds wrote:
> > 
> > With more history, "rev-list" should do basically the right thing: it will
> > be constant-time for _recent_ commits, and it is linear time in how far
> > back you want to go. Which seems quite reasonable.
> 
> Which is quite horrible, if you have a 500k+ blobs repo.

It's _not_ linear in blobs. It doesn't care at all about them, in fact. 

It's linear in how many revisions you go backwards. And I claim that you 
can't do any better than that, without doing _really_ bad things.

> I know you are database allergic, but there a database is the correct
> solution.

I disagree. I'm not database allergic, I just don't believe in the notion 
that databases solve all the worlds problems.

> Having stored all the relations of those file/tree/commit
> blobs in a database it takes <20ms to have a list of all those file
> blobs in historical order with some context information retrieved.

.. and such an SCM will _suck_ for anything else.

You just made creating a commit etc much slower. You now have to update 
per-file information that you never updated before, and look at 
information that git simply doesn't _care_ about. 

Right now, when we create a new version, it's pretty much instantaneous.  
Exactly becaue we do not look at a _single_ file, and we don't care how
they changed from the "previous" version. We just write out the knowledge
about what the files are now.

Doing a database of file changes would absolutely _suck_. Anybody who
thinks that databases are magically faster than not using a database
doesn't understand basic physics. Things don't go faster just because you
call it a database. Things go faster by _doing_less_.

Normally, a database does less by keeping indexes etc around, and the 
indexes require less work than the data itself. But git _does_ all of that 
already. Git very much _is_ a database, it's just a specialized one.

I dare you to show me wrong. I don't _care_ of you can show the revision
history of a single file in 20ms. The easiest way to do that is with a
delta format, where the file information basically is single-file in the
first place, and you just open the file and print out the results. Guess
what? We've had that. It's called RCS/SCCS/CVS, and it's a piece of total
and absolute crap. Exactly because single-file revisions simply do not
matter.

If you want to use a database, go wild. But use it as a _cache_. Then you 
can build up the database of file revisions "after the fact", and always 
know that your database is not the real data, it's just an index, and can 
be thrown away and regenerated at will.

That way you don't add overhead to the stuff that actually matters, and 
that git does a lot better than a general-purpose database could ever do.

			Linus

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Finding file revisions
  2005-04-27 16:50 Finding file revisions Chris Mason
  2005-04-27 17:34 ` Linus Torvalds
@ 2005-04-28 16:08 ` Daniel Barkalow
  2005-04-28 17:05   ` Chris Mason
  1 sibling, 1 reply; 25+ messages in thread
From: Daniel Barkalow @ 2005-04-28 16:08 UTC (permalink / raw)
  To: Chris Mason; +Cc: git

On Wed, 27 Apr 2005, Chris Mason wrote:

> I haven't seen a tool yet to find which changeset modified a given file, so 
> I whipped up something.  The basic idea is to:

What is the answer supposed to be in the presence of merges? It seems like
you shouldn't report the merge that brought in the change, but rather
(assuming it's available) the changeset that originally made it.

That is:

go through the history tree:
  if a commit has a parent with a different version:
    if it also has a parent with the same version as the child, ignore the
      different parent(s) and enqueue the same parent(s)
    otherwise, report it (for a single head, it's the original change; for
      a merge, it merged two changes to the file)
  otherwise, enqueue all the parents

Sorting by time is probably not useful, because there must be some source
of the current version, and all paths going back, after ignoring versions
that were replaced by it in a merge, must go back to that source, so
depth-first search is fastest. (If there are multiple possible solutions,
then it means that multiple people applied the same patch, and any of them
should do).

This should be easy in C, but difficult in something that isn't generating
the history info itself.

	-Daniel
*This .sig left intentionally blank*


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Finding file revisions
  2005-04-28 11:45       ` Chris Mason
@ 2005-04-28 16:34         ` Kay Sievers
  2005-04-28 17:10           ` Tony Luck
  2005-04-28 19:11         ` Kay Sievers
  1 sibling, 1 reply; 25+ messages in thread
From: Kay Sievers @ 2005-04-28 16:34 UTC (permalink / raw)
  To: Chris Mason; +Cc: Linus Torvalds, git

On Thu, 2005-04-28 at 07:45 -0400, Chris Mason wrote:
> On Wednesday 27 April 2005 18:19, Linus Torvalds wrote:
> > On Wed, 27 Apr 2005, Chris Mason wrote:
> > > So, new prog attached.  New usage:
> > >
> > > file-changes [-c commit_id] [-s commit_id] file ...
> > >
> > > -c is the commit where you want to start searching
> > > -s is the commit where you want to stop searching
> >
> > Your script will do some funky stuff, because you incorrectly think that
> > the rev-list is sorted linearly. It's not. It's sorted in a rough
> > chronological order, but you really can't do the "last" vs "cur" thing
> > that you do, because two commits after each other in the rev-list listing
> > may well be from two totally different branches, so when you compare one
> > tree against the other, you're really doing something pretty nonsensical.
> 
> One more rev that should work as you suggested Here's the example output 
> from a cogito changeset with merges.  I print the diff-tree lines once for each 
> matching parent and then print the commit once.  It's very primitive, but
> hopefully some day someone will make a gui with happy clicky buttons
> for changesets and filerevs.

Not really happy clicky, but ... :)

Look at the (history) link:
  http://ehlo.org/~kay/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=fb3b4ebc0be618dbcc2326482a83c920d51af7de

Kay


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Finding file revisions
  2005-04-28 15:24     ` Linus Torvalds
@ 2005-04-28 16:47       ` Thomas Gleixner
  0 siblings, 0 replies; 25+ messages in thread
From: Thomas Gleixner @ 2005-04-28 16:47 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Chris Mason, git

On Thu, 2005-04-28 at 08:24 -0700, Linus Torvalds wrote:
> I disagree. I'm not database allergic, I just don't believe in the notion 
> that databases solve all the worlds problems.

I never claimed, they did

> You just made creating a commit etc much slower. You now have to update 
> per-file information that you never updated before, and look at 
> information that git simply doesn't _care_ about. 

I did not say, that such a fetaure should be included into git itself.
That was never my intention.

> what? We've had that. It's called RCS/SCCS/CVS, and it's a piece of total
> and absolute crap. Exactly because single-file revisions simply do not
> matter.

I agree that RCS is crap for distributed development, but seeing a
change in a file in the correct context is quite helpful at times.

> If you want to use a database, go wild. But use it as a _cache_. Then you 
> can build up the database of file revisions "after the fact", and always 
> know that your database is not the real data, it's just an index, and can 
> be thrown away and regenerated at will.

Thats all I want to use it for. Exactly for of tracking information over
various repos and longer time intervals.

tglx



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Finding file revisions
  2005-04-28 16:08 ` Daniel Barkalow
@ 2005-04-28 17:05   ` Chris Mason
  0 siblings, 0 replies; 25+ messages in thread
From: Chris Mason @ 2005-04-28 17:05 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: git, David Woodhouse, Kay Sievers

On Thursday 28 April 2005 12:08, Daniel Barkalow wrote:
> On Wed, 27 Apr 2005, Chris Mason wrote:
> > I haven't seen a tool yet to find which changeset modified a given file,
> > so I whipped up something.  The basic idea is to:
>
> What is the answer supposed to be in the presence of merges? It seems like
> you shouldn't report the merge that brought in the change, but rather
> (assuming it's available) the changeset that originally made it.

Based on comments from Linus I did make it a little more merge aware.  But 
since my tool was just to tide me over until someone fixed things in gui 
form, I didn't want to kill off too many brain cells coding it.

It sounds as though David's script is already has more merge brains then mine, 
and the git web stuff is pretty slick.  So it seems I didn't look hard enough 
before...

-chris

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Finding file revisions
  2005-04-28 16:34         ` Kay Sievers
@ 2005-04-28 17:10           ` Tony Luck
  2005-04-28 17:22             ` Thomas Glanzmann
  0 siblings, 1 reply; 25+ messages in thread
From: Tony Luck @ 2005-04-28 17:10 UTC (permalink / raw)
  To: Kay Sievers; +Cc: Chris Mason, Linus Torvalds, git

> Not really happy clicky, but ... :)
> 
> Look at the (history) link:
>   http://ehlo.org/~kay/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=fb3b4ebc0be618dbcc2326482a83c920d51af7de

Looks very useful.  Would it be possible to display the date (from the
commit) instead of
the 40-hex-char blobname (but have the link still point to the blob). 
Like this:

2005-04-27 [PATCH] USB: MODALIAS change for bcdDevice
2005-04-26 Merge with
kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6.git/
2005-04-26 Merge with kernel.org:/pub/scm/linux/kernel/git/gregkh/aoe-2.6.git/

That way you'd trade some screen space that is filled with hex numbers for some
useful information.  Dates could either be absolute (as in my
example), or relative
("4 hours ago", "2 weeks ago", etc.)

-Tony

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Finding file revisions
  2005-04-28 17:10           ` Tony Luck
@ 2005-04-28 17:22             ` Thomas Glanzmann
  0 siblings, 0 replies; 25+ messages in thread
From: Thomas Glanzmann @ 2005-04-28 17:22 UTC (permalink / raw)
  To: git

Hello,

> Looks very useful.  Would it be possible to display the date (from the
> commit) instead of the 40-hex-char blobname (but have the link still
> point to the blob).  Like this:

First of all there is a date on the site and second I think the sha1
hash much more useful than the date.

	Thomas

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Finding file revisions
  2005-04-28 11:45       ` Chris Mason
  2005-04-28 16:34         ` Kay Sievers
@ 2005-04-28 19:11         ` Kay Sievers
  2005-04-28 20:58           ` Chris Mason
  1 sibling, 1 reply; 25+ messages in thread
From: Kay Sievers @ 2005-04-28 19:11 UTC (permalink / raw)
  To: Chris Mason; +Cc: Linus Torvalds, git

On Thu, 2005-04-28 at 07:45 -0400, Chris Mason wrote:
> On Wednesday 27 April 2005 18:19, Linus Torvalds wrote:
> > On Wed, 27 Apr 2005, Chris Mason wrote:
> > > So, new prog attached.  New usage:
> > >
> > > file-changes [-c commit_id] [-s commit_id] file ...
> > >
> > > -c is the commit where you want to start searching
> > > -s is the commit where you want to stop searching
> >
> > Your script will do some funky stuff, because you incorrectly think that
> > the rev-list is sorted linearly. It's not. It's sorted in a rough
> > chronological order, but you really can't do the "last" vs "cur" thing
> > that you do, because two commits after each other in the rev-list listing
> > may well be from two totally different branches, so when you compare one
> > tree against the other, you're really doing something pretty nonsensical.
> 
> One more rev that should work as you suggested Here's the example output 
> from a cogito changeset with merges.  I print the diff-tree lines once for each 
> matching parent and then print the commit once.  It's very primitive, but
> hopefully some day someone will make a gui with happy clicky buttons
> for changesets and filerevs.
> 
> diff-tree -r 2544d7558f0ce94ab9c163f5b67244f71d8c85b8 69eeae031bf5447e99b9274761e2361e8c5a944e
> 618fdb616cebbd2fc9f1cddc0b6b75fd575250a1->3579b5fd1182679a39b83eaaa9dd0e7c970f4545 diff-tree.c
> diff-tree -r 9831d8f86095edde393e495d7a55cab9d35d5d05 69eeae031bf5447e99b9274761e2361e8c5a944e
> 2d2913b6b98ac836b43755b1304d2a838dad87dd->4f01bbbbb3fd0e53e9ce968f167b6dae68fcfa92 Makefile
> cat-file commit 69eeae031bf5447e99b9274761e2361e8c5a944e
>     tree 7510dc1b63e9e690ec73952e40a31e43af4b55bc
>     parent 2544d7558f0ce94ab9c163f5b67244f71d8c85b8
>     parent 9831d8f86095edde393e495d7a55cab9d35d5d05
>     author Petr Baudis <pasky@ucw.cz> 1114544917 +0200
>     committer Petr Baudis <xpasky@machine.sinus.cz> 1114544917 +0200

Can you confirm this with the kernel tree? 
  file-changes -c 9acf6597c533f3d5c991f730c6a1be296679018e drivers/usb/core/usb.c

lists the commit:
  diff-tree -r 1d66c64c3cee10a465cd3f8bd9191bbeb718f650 c79bea07ec4d3ef087962699fe8b2f6dc5ca7754
  f0534ee064901d0108eb7b2b1fcb59a98bb53c2b->c231b4bef314284a168fedb6c5f6c47aec5084fc drivers/usb/core/usb.c
  cat-file commit c79bea07ec4d3ef087962699fe8b2f6dc5ca7754

which seems not to have changed the file asked for.

Thanks,
Kay


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Finding file revisions
  2005-04-28 19:11         ` Kay Sievers
@ 2005-04-28 20:58           ` Chris Mason
  2005-04-28 21:32             ` Linus Torvalds
  2005-04-28 21:33             ` Kay Sievers
  0 siblings, 2 replies; 25+ messages in thread
From: Chris Mason @ 2005-04-28 20:58 UTC (permalink / raw)
  To: Kay Sievers; +Cc: Linus Torvalds, git

On Thursday 28 April 2005 15:11, Kay Sievers wrote:
>
> Can you confirm this with the kernel tree?
>   file-changes -c 9acf6597c533f3d5c991f730c6a1be296679018e
> drivers/usb/core/usb.c
>
> lists the commit:
>   diff-tree -r 1d66c64c3cee10a465cd3f8bd9191bbeb718f650
> c79bea07ec4d3ef087962699fe8b2f6dc5ca7754
> f0534ee064901d0108eb7b2b1fcb59a98bb53c2b->c231b4bef314284a168fedb6c5f6c47ae
>c5084fc drivers/usb/core/usb.c cat-file commit
> c79bea07ec4d3ef087962699fe8b2f6dc5ca7754
>
> which seems not to have changed the file asked for.

Hmmm, that does work here:

coffee:/src/git # diff-tree -r 1d66c64c3cee10a465cd3f8bd9191bbeb718f650 c79bea07ec4d3ef087962699fe8b2f6dc5ca7754 | grep usb.core.usb.c
*100644->100644 blob    f0534ee064901d0108eb7b2b1fcb59a98bb53c2b->c231b4bef314284a168fedb6c5f6c47aec5084fc      drivers/usb/core/usb.c

-chris

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Finding file revisions
  2005-04-28 20:58           ` Chris Mason
@ 2005-04-28 21:32             ` Linus Torvalds
  2005-04-28 21:33             ` Kay Sievers
  1 sibling, 0 replies; 25+ messages in thread
From: Linus Torvalds @ 2005-04-28 21:32 UTC (permalink / raw)
  To: Chris Mason; +Cc: Kay Sievers, git



On Thu, 28 Apr 2005, Chris Mason wrote:

> On Thursday 28 April 2005 15:11, Kay Sievers wrote:
> >
> > Can you confirm this with the kernel tree?
> >   file-changes -c 9acf6597c533f3d5c991f730c6a1be296679018e drivers/usb/core/usb.c
> >
> > lists the commit:
> >   diff-tree -r 1d66c64c3cee10a465cd3f8bd9191bbeb718f650 c79bea07ec4d3ef087962699fe8b2f6dc5ca7754
> > f0534ee064901d0108eb7b2b1fcb59a98bb53c2b->c231b4bef314284a168fedb6c5f6c47aec5084fc drivers/usb/core/usb.c
> >
> >  cat-file commit c79bea07ec4d3ef087962699fe8b2f6dc5ca7754
> >
> > which seems not to have changed the file asked for.
> 
> Hmmm, that does work here:
> 
> coffee:/src/git # diff-tree -r 1d66c64c3cee10a465cd3f8bd9191bbeb718f650 c79bea07ec4d3ef087962699fe8b2f6dc5ca7754 | grep usb.core.usb.c
> *100644->100644 blob    f0534ee064901d0108eb7b2b1fcb59a98bb53c2b->c231b4bef314284a168fedb6c5f6c47aec5084fc      drivers/usb/core/usb.c

I think Key is confused by the fact that the commit is a -merge- commit, 
and the first parent has _not_ changed that file - it got changed through 
the merge.

Ie:

	cat-file commit c79bea07ec4d3ef087962699fe8b2f6dc5ca7754

gives

	tree 3fbdc4745cfde60df7d05815b343e4a253020530
	parent a9e4820c4c170b3df0d2185f7b4130b0b2daed2c
	parent 1d66c64c3cee10a465cd3f8bd9191bbeb718f650
	author Linus Torvalds <torvalds@ppc970.osdl.org.(none)> 1113921100 -0700
	committer Linus Torvalds <torvalds@ppc970.osdl.org.(none)> 1113921100 -0700

	Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/i2c-2.6.git/

and if you do a diff against the _first_ parent you don't see anything 
changing in USB..

		Linus

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Finding file revisions
  2005-04-28 20:58           ` Chris Mason
  2005-04-28 21:32             ` Linus Torvalds
@ 2005-04-28 21:33             ` Kay Sievers
  2005-04-28 21:50               ` Linus Torvalds
  2005-04-28 22:27               ` Chris Mason
  1 sibling, 2 replies; 25+ messages in thread
From: Kay Sievers @ 2005-04-28 21:33 UTC (permalink / raw)
  To: Chris Mason; +Cc: Linus Torvalds, git

On Thu, 2005-04-28 at 16:58 -0400, Chris Mason wrote:
> On Thursday 28 April 2005 15:11, Kay Sievers wrote:
> >
> > Can you confirm this with the kernel tree?
> >   file-changes -c 9acf6597c533f3d5c991f730c6a1be296679018e
> > drivers/usb/core/usb.c
> >
> > lists the commit:
> >   diff-tree -r 1d66c64c3cee10a465cd3f8bd9191bbeb718f650
> > c79bea07ec4d3ef087962699fe8b2f6dc5ca7754
> > f0534ee064901d0108eb7b2b1fcb59a98bb53c2b->c231b4bef314284a168fedb6c5f6c47ae
> >c5084fc drivers/usb/core/usb.c cat-file commit
> > c79bea07ec4d3ef087962699fe8b2f6dc5ca7754
> >
> > which seems not to have changed the file asked for.
> 
> Hmmm, that does work here:
> 
> coffee:/src/git # diff-tree -r 1d66c64c3cee10a465cd3f8bd9191bbeb718f650 c79bea07ec4d3ef087962699fe8b2f6dc5ca7754 | grep usb.core.usb.c
> *100644->100644 blob    f0534ee064901d0108eb7b2b1fcb59a98bb53c2b->c231b4bef314284a168fedb6c5f6c47aec5084fc      drivers/usb/core/usb.c
> 
> -chris
> -
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Sure. But file-changes lists the commit:
  c79bea07ec4d3ef087962699fe8b2f6dc5ca7754

when asked for:
  "drivers/usb/core/usb.c"

and that file isn't touched there. Actually it lists merge-commits which
are not related to the file.

Thanks,
Kay



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Finding file revisions
  2005-04-28 21:33             ` Kay Sievers
@ 2005-04-28 21:50               ` Linus Torvalds
  2005-04-28 22:27               ` Chris Mason
  1 sibling, 0 replies; 25+ messages in thread
From: Linus Torvalds @ 2005-04-28 21:50 UTC (permalink / raw)
  To: Kay Sievers; +Cc: Chris Mason, git



On Thu, 28 Apr 2005, Kay Sievers wrote:
> 
> Sure. But file-changes lists the commit:
>   c79bea07ec4d3ef087962699fe8b2f6dc5ca7754
> 
> when asked for:
>   "drivers/usb/core/usb.c"
> 
> and that file isn't touched there. Actually it lists merge-commits which
> are not related to the file.

It really _is_ touched by that commit. Look closer.

It has two parents: one that had already merged with Greg's USB tree, and 
one that had _not_ done so.

So whether it "modifies" the USB files or not really depends on which 
parent you go back. 

In general, you tend to want to ignore merge-nodes for looking at 
differences, but the differences are definitely there, and they are often 
vital (ie it's often _very_ important to know which side of a merge didn't 
change something).

		Linus

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Finding file revisions
  2005-04-28 21:33             ` Kay Sievers
  2005-04-28 21:50               ` Linus Torvalds
@ 2005-04-28 22:27               ` Chris Mason
  1 sibling, 0 replies; 25+ messages in thread
From: Chris Mason @ 2005-04-28 22:27 UTC (permalink / raw)
  To: Kay Sievers; +Cc: Linus Torvalds, git

On Thursday 28 April 2005 17:33, Kay Sievers wrote:
> Sure. But file-changes lists the commit:
>   c79bea07ec4d3ef087962699fe8b2f6dc5ca7754
>
> when asked for:
>   "drivers/usb/core/usb.c"
>
> and that file isn't touched there. Actually it lists merge-commits which
> are not related to the file.

Ok, this is what Daniel and David were talking about.  When we've got commit 
with multiple parents, we'll find the file at least one more time than it was 
really changed.  Looking at the results on git web, it's easy it ignore the 
merge sets as noise, but it would be nice if we only printed the merge set 
when it made some change to the file the original cset being merged did not.

I had misread your first mail, thinking that you had developed this 
independently and solved these issues ;)  The problem is that if we do a true 
depth first search, it seems like we'll have to keep a potentially unbounded 
amount of data in order to find the first changeset that happened to create a 
given sha1.  I'd really rather print the mergeset and let the user figure it 
out.

But, we're not really printing a merge set so much as we're printing the 
complete diff of what was merged.  Is there some way to see what changes had 
to be done in order to resolve conflicts during a merge?

-chris

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2005-04-28 22:24 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-04-27 16:50 Finding file revisions Chris Mason
2005-04-27 17:34 ` Linus Torvalds
2005-04-27 18:23   ` Chris Mason
2005-04-27 22:19     ` Linus Torvalds
2005-04-27 22:31       ` Chris Mason
2005-04-28  8:41         ` Simon Fowler
2005-04-28 11:56           ` Chris Mason
2005-04-28 13:13             ` Simon Fowler
2005-04-28 11:45       ` Chris Mason
2005-04-28 16:34         ` Kay Sievers
2005-04-28 17:10           ` Tony Luck
2005-04-28 17:22             ` Thomas Glanzmann
2005-04-28 19:11         ` Kay Sievers
2005-04-28 20:58           ` Chris Mason
2005-04-28 21:32             ` Linus Torvalds
2005-04-28 21:33             ` Kay Sievers
2005-04-28 21:50               ` Linus Torvalds
2005-04-28 22:27               ` Chris Mason
2005-04-28 13:09       ` David Woodhouse
2005-04-28 13:01     ` David Woodhouse
2005-04-27 18:41   ` Thomas Gleixner
2005-04-28 15:24     ` Linus Torvalds
2005-04-28 16:47       ` Thomas Gleixner
2005-04-28 16:08 ` Daniel Barkalow
2005-04-28 17:05   ` Chris Mason

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).