Git development

Git development
 help / color / mirror / Atom feed

* [PATCH] Make diff-cache and friends output more cg-patch friendly.
From: Junio C Hamano @ 2005-04-28  6:28 UTC (permalink / raw)
  To: Linus Torvalds, Petr Baudis; +Cc: Andrew Morton, git
In-Reply-To: <7vhdhra2sg.fsf@assigned-by-dhcp.cox.net>

This patch changes the way the default arguments to diff are
built when diff-cache and friends are invoked with -p and there
is no GIT_EXTERNAL_DIFF environment variable.  It attempts to be
more cg-patch friendly by:

 - Showing diffs against /dev/null to denote added or removed
   files;

 - Showing file modes for existing files as a comment after the
   diff label.

Unfortunately with this change GIT_DIFF_CMD customization cannot
be supported easily anymore, so it has been dropped.
GIT_DIFF_OPTS customization to change diffs from unified to
context is still there, though.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---

diff.c |   56 ++++++++++++++++++++++++++++++++++++--------------------
1 files changed, 36 insertions(+), 20 deletions(-)

# - 04/27 21:50 diff.c clean up temporary file.
# + 04/27 23:18 Attempt to minimally be compatible with cg-Xdiffdo.
--- k/diff.c  (mode:100644)
+++ l/diff.c  (mode:100644)
@@ -7,7 +7,6 @@
 #include "cache.h"
 #include "diff.h"
 
-static char *diff_cmd = "diff -L'k/%s' -L'l/%s'";
 static char *diff_opts = "-pu";
 
 static const char *external_diff(void)
@@ -24,14 +23,12 @@ static const char *external_diff(void)
 	 * alternative styles you can specify via environment
 	 * variables are:
 	 *
-	 * GIT_DIFF_CMD="diff -L '%s' -L '%s'"
 	 * GIT_DIFF_OPTS="-c";
 	 */
 	if (getenv("GIT_EXTERNAL_DIFF"))
 		external_diff_cmd = getenv("GIT_EXTERNAL_DIFF");
 
 	/* In case external diff fails... */
-	diff_cmd = getenv("GIT_DIFF_CMD") ? : diff_cmd;
 	diff_opts = getenv("GIT_DIFF_OPTS") ? : diff_opts;
 
 	done_preparing = 1;
@@ -84,31 +81,50 @@ static struct diff_tempfile {
 static void builtin_diff(const char *name,
 			 struct diff_tempfile *temp)
 {
-	static char *diff_arg  = "'%s' '%s'";
-	const char *name_1_sq = sq_expand(temp[0].name);
-	const char *name_2_sq = sq_expand(temp[1].name);
+	int i, next_at;
+	const char *diff_cmd = "diff -L'%s%s%s' -L'%s%s%s'";
+	const char *diff_arg  = "'%s' '%s'";
+	const char *input_name_sq[2];
+	const char *path0[2];
+	const char *path1[2];
+	char mode[2][20];
 	const char *name_sq = sq_expand(name);
-
-	/* diff_cmd and diff_arg have 4 %s in total which makes
-	 * the sum of these strings 8 bytes larger than required.
+	char *cmd;
+	
+	/* diff_cmd and diff_arg have 8 %s in total which makes
+	 * the sum of these strings 16 bytes larger than required.
 	 * we use 2 spaces around diff-opts, and we need to count
-	 * terminating NUL, so we subtract 5 here.
+	 * terminating NUL, so we subtract 13 here.
 	 */
-	int cmd_size = (strlen(diff_cmd) + 
-			strlen(name_sq) * 2 +
-			strlen(diff_opts) +
-			strlen(diff_arg) +
-			strlen(name_1_sq) + strlen(name_2_sq)
-			- 5);
-	char *cmd = xmalloc(cmd_size);
-	int next_at = 0;
+	int cmd_size = (strlen(diff_cmd) + strlen(diff_opts) +
+			strlen(diff_arg) - 13);
+	for (i = 0; i < 2; i++) {
+		input_name_sq[i] = sq_expand(temp[i].name);
+		if (!strcmp(temp[i].name, "/dev/null")) {
+			path0[i] = "/dev/null";
+			path1[i] = "";
+			mode[i][0] = 0;
+		} else {
+			path0[i] = i ? "l/" : "k/";
+			path1[i] = name_sq;
+			sprintf(mode[i], "  (mode:%s)", temp[i].mode);
+		}
+		cmd_size += (strlen(path0[i]) + strlen(path1[i]) +
+			     strlen(mode[i]) + strlen(input_name_sq[i]));
+	}
+
+	cmd = xmalloc(cmd_size);
 
+	next_at = 0;
 	next_at += snprintf(cmd+next_at, cmd_size-next_at,
-			    diff_cmd, name_sq, name_sq);
+			    diff_cmd,
+			    path0[0], path1[0], mode[0],
+			    path0[1], path1[1], mode[1]);
 	next_at += snprintf(cmd+next_at, cmd_size-next_at,
 			    " %s ", diff_opts);
 	next_at += snprintf(cmd+next_at, cmd_size-next_at,
-			    diff_arg, name_1_sq, name_2_sq);
+			    diff_arg, input_name_sq[0], input_name_sq[1]);
+
 	execlp("/bin/sh","sh", "-c", cmd, NULL);
 }
 


^ permalink raw reply

* Re: PATCH[0/4]: Allow tree-id to return the ID of a tree object
From: Philip Pokorny @ 2005-04-28  7:11 UTC (permalink / raw)
  To: Petr Baudis; +Cc: git
In-Reply-To: <20050427232223.GM22956@pasky.ji.cz>

Petr Baudis wrote:

>Could you please functionally split and sign off your patch?
>  
>
Apologies, I'm still a bit new to LKML etiquette...

>Also, I'd prefer not to have the sha1 completion logic duplicated; what
>about just having commit-id take a parameter not to validate its id?
>Actually, that's ugly too. I think the cleanest solution would be to
>reintroduce the cg-Xnormid, now to only really do the _common_ stuff -
>basically everything up to the typecheck (exclusively) in commit-id.
>  
>
OK, so following this will be a new set of patches (I assume you mean 
one patch per file changed when you asked for a "functional split") that 
re-introduce cg-Xnormid, and then convert commit-id, tree-id, and 
parent-id to use the new core.

In re-writing these, I've put great effort into making the scripts *not* 
exec a sub-shell or process, so the bash constructs may look strange. 
I'm sure Linux can exec very quickly, but bash parses and executes even 
faster when it doesn't have to fork.

^ permalink raw reply

* Re: PATCH[1/4]: Allow tree-id to return the ID of a tree object
From: Philip Pokorny @ 2005-04-28  7:14 UTC (permalink / raw)
  To: Petr Baudis; +Cc: git
In-Reply-To: <20050427232223.GM22956@pasky.ji.cz>

[-- Attachment #1: Type: text/plain, Size: 119 bytes --]

Patch to re-introduce cg-Xnormid for common ID normalizing.

Signed-off-by: Philip Pokorny <ppokorny@mindspring.com>



[-- Attachment #2: cogito-0.8-cg-Xnormid.patch --]
[-- Type: text/plain, Size: 1962 bytes --]

Index: cg-Xnormid
===================================================================
--- /dev/null  (tree:6ad600e20c89323c1d3049f75b8ca9b0a2d72167)
+++ uncommitted/cg-Xnormid  (mode:100755 sha1:6dc089c8d571f330e2e01d96f79616d6146840ee)
@@ -0,0 +1,47 @@
+#!/usr/bin/env bash
+#
+# Normalize an ID to an SHA1 hash value
+#    Strings resolve in order:
+#       NULL, this, HEAD  => .git/HEAD
+#       <tags>
+#       <heads>
+#       short SHA1 (4 or more hex digits)
+#
+# Copyright (c) Philip Pokorny, 2005
+
+id="$1"
+
+if [ ! "$id" ] || [ "$id" = "this" ] || [ "$id" = "HEAD" ]; then
+	read id < .git/HEAD
+
+elif [ -r ".git/refs/tags/$id" ]; then
+	read id < ".git/refs/tags/$id"
+
+elif [ -r ".git/refs/heads/$id" ]; then
+	read id < ".git/refs/heads/$id"
+
+# Short id's must be lower case and at least 4 digits.
+elif [[ "$id" == [0-9a-z][0-9a-z][0-9a-z][0-9a-z]* ]]; then
+	idpost=${id#??}
+	idpref=${id%$idpost}
+
+	# Assign array elements to matching names
+	idmatch=(.git/objects/$idpref/$idpost*)
+
+	if [ ${#idmatch[*]} -eq 1 ] && [ -r "$idmatch" ]; then
+		id=$idpref${idmatch#.git/objects/$idpref/}
+	elif [ ${#idmatch[*]} -gt 1 ]; then
+		echo "Ambiguous id: $id" >&2
+		exit 1
+	fi
+fi
+
+# FIXME? Should we verify the existance of the ID in the object cache?
+
+# If we don't have a 40-char ID by now, it's an error
+if [ ${#id} -ne 40 ]; then
+	echo "Invalid id: $id" >&2
+	exit 1
+fi
+
+echo $id
Index: Makefile
===================================================================
--- 6ad600e20c89323c1d3049f75b8ca9b0a2d72167/Makefile  (mode:100644 sha1:d73bea1cbb9451a89b03d6066bf2ed7fec32fd31)
+++ uncommitted/Makefile  (mode:100664)
@@ -44,7 +44,7 @@
 	cg-add cg-admin-lsobj cg-cancel cg-clone cg-commit cg-diff \
 	cg-export cg-help cg-init cg-log cg-ls cg-merge cg-mkpatch \
 	cg-patch cg-pull cg-branch-add cg-branch-ls cg-rm cg-seek cg-status \
-	cg-tag cg-update cg-Xlib
+	cg-tag cg-update cg-Xlib cg-Xnormid
 
 COMMON=	read-cache.o
 

^ permalink raw reply

* Re: PATCH[3/4]: Allow tree-id to return the ID of a tree object
From: Philip Pokorny @ 2005-04-28  7:16 UTC (permalink / raw)
  To: Petr Baudis; +Cc: git
In-Reply-To: <20050427232223.GM22956@pasky.ji.cz>

[-- Attachment #1: Type: text/plain, Size: 110 bytes --]

Convert tree-id to use cg-Xnormid to normalize an ID

Signed-off-by: Philip Pokorny <ppokorny@mindspring.com>

[-- Attachment #2: cogito-0.8-tree-id.patch --]
[-- Type: text/plain, Size: 1175 bytes --]

Index: tree-id
===================================================================
--- 6ad600e20c89323c1d3049f75b8ca9b0a2d72167/tree-id  (mode:100755 sha1:1495ff78af71b57e21653512932bcda88fe05454)
+++ uncommitted/tree-id  (mode:100775)
@@ -1,17 +1,31 @@
 #!/usr/bin/env bash
 #
 # Get ID of tree associated with given commit or HEAD.
+#
 # Copyright (c) Petr Baudis, 2005
 #
-# Takes ID of the appropriate commit, defaults to HEAD.
 
-SHA1="[A-Za-z0-9]{40}"
-TREE="^tree $SHA1$"
+# Save for later error message
+orig="$1"
 
-id=$(cat-file commit $(commit-id "$1") | egrep "$TREE" | cut -d ' ' -f 2)
+# Normalize to SHA1 form
+id=$(cg-Xnormid "$1")
+if [ ! "$id" ]; then
+	# cg-Xnormid already reported the error
+	exit 1
+fi
+
+# Is it a commit?
+idtype=$(cat-file -t $id 2>/dev/null)
+if [ "$idtype" = "commit" ]; then
+	# Get the tree
+	id=$(cat-file commit "$id" | sed -e 's/tree //;q')
+fi
 
-if [ "$(cat-file -t "$id")" != "tree" ]; then
-	echo "Invalid id: $id" >&2
+# cat-file will verify that $id is a valid SHA1 ID for us
+# If it isn't, we'll get '' back
+if [ "$(cat-file -t "$id" 2>/dev/null)" != "tree" ]; then
+	echo "Invalid tree id: $orig" >&2
 	exit 1
 fi
 

^ permalink raw reply

* Re: PATCH[2/4]: Allow tree-id to return the ID of a tree object
From: Philip Pokorny @ 2005-04-28  7:15 UTC (permalink / raw)
  To: Petr Baudis; +Cc: git
In-Reply-To: <20050427232223.GM22956@pasky.ji.cz>

[-- Attachment #1: Type: text/plain, Size: 117 bytes --]

Convert commit-id to use the new cg-Xnormid internal script

Signed-off-by: Philip Pokorny <ppokorny@mindspring.com>

[-- Attachment #2: cogito-0.8-commit-id.patch --]
[-- Type: text/plain, Size: 1562 bytes --]

Index: commit-id
===================================================================
--- 6ad600e20c89323c1d3049f75b8ca9b0a2d72167/commit-id  (mode:100755 sha1:4efcb6bdfdb2b2c5744f5d4d47d92beb7777ed59)
+++ uncommitted/commit-id  (mode:100775)
@@ -1,39 +1,25 @@
 #!/usr/bin/env bash
 #
 # Get ID of commit associated with given id or HEAD.
+#
 # Copyright (c) Petr Baudis, 2005
 #
-# Takes the appropriate ID, defaults to HEAD.
-
-SHA1="[A-Za-z0-9]{40}"
-SHA1ONLY="^$SHA1$"
-
-id=$1
-if [ ! "$id" ] || [ "$id" = "this" ] || [ "$id" = "HEAD" ]; then
-	id=$(cat .git/HEAD)
-fi
 
-if (echo $id | egrep -vq "$SHA1ONLY") && [ -r ".git/refs/tags/$id" ]; then
-	id=$(cat ".git/refs/tags/$id")
-fi
-
-if (echo $id | egrep -vq "$SHA1ONLY") && [ -r ".git/refs/heads/$id" ]; then
-	id=$(cat ".git/refs/heads/$id")
-fi
+# Save for later error message
+orig="$1"
 
-idpref=$(echo "$id" | cut -c -2)
-idpost=$(echo "$id" | cut -c 3-)
-if [ $(find ".git/objects/$idpref" -name "$idpost*" 2>/dev/null | wc -l) -eq 1 ]; then
-	id=$idpref$(basename $(echo .git/objects/$idpref/$idpost*))
-fi
+# Normalize to SHA1 form
+id=$(cg-Xnormid "$orig")
 
-if echo $id | egrep -vq "$SHA1ONLY"; then
-	echo "Invalid id: $id" >&2
+if [ ! "$id" ]; then
+	# cg-Xnormid already reported the error
 	exit 1
 fi
 
-if [ "$(cat-file -t "$id")" != "commit" ]; then
-	echo "Invalid id: $id" >&2
+# cat-file will verify that $id is a valid SHA1 ID for us
+# If it isn't, we'll get '' back
+if [ "$(cat-file -t "$id" 2>/dev/null)" != "commit" ]; then
+	echo "Invalid commit id: $orig" >&2
 	exit 1
 fi
 

^ permalink raw reply

* Re: PATCH[4/4]: Allow tree-id to return the ID of a tree object
From: Philip Pokorny @ 2005-04-28  7:17 UTC (permalink / raw)
  To: Petr Baudis; +Cc: git
In-Reply-To: <20050427232223.GM22956@pasky.ji.cz>

[-- Attachment #1: Type: text/plain, Size: 204 bytes --]

Convert parent-id to similar style and function as the new commit-id and 
tree-id.

NOTE: parent-id uses commit-id rather than cg-Xnormid directly

Signed-off-by: Philip Pokorny <ppokorny@mindspring.com>

[-- Attachment #2: cogito-0.8-parent-id.patch --]
[-- Type: text/plain, Size: 889 bytes --]

Index: parent-id
===================================================================
--- 6ad600e20c89323c1d3049f75b8ca9b0a2d72167/parent-id  (mode:100755 sha1:f35877a6aa5b68d2fb4a388dcfa9b3e64262604e)
+++ uncommitted/parent-id  (mode:100775)
@@ -1,12 +1,19 @@
 #!/usr/bin/env bash
 #
 # Get ID of parent commit to a given revision or HEAD.
+# NOTE: will return multiple SHA1s if ID is a commit with multiple parents
+#
 # Copyright (c) Petr Baudis, 2005
 #
-# Takes ID of the current commit, defaults to HEAD.
 
-PARENT="^parent [A-Za-z0-9]{40}$"
+# Save for later error message
+orig="$1"
 
-id=$(commit-id $1) || exit 1
+# Normalize to SHA1 form and verify its a commit
+id=$(commit-id "$1")
+if [ ! "$id" ]; then
+	# commit-id already reported the error
+	exit 1
+fi
 
-cat-file commit $id | egrep "$PARENT" | cut -d ' ' -f 2
+cat-file commit $id | awk '/^parent/{print $2};/^$/{exit}'

^ permalink raw reply

* Re: kernel.org now has gitweb installed
From: David Woodhouse @ 2005-04-28  7:35 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Git Mailing List
In-Reply-To: <42703E79.8050808@zytor.com>

On Wed, 2005-04-27 at 18:38 -0700, H. Peter Anvin wrote:
> http://www.kernel.org/git/

Looks like the ordering is wrong. A chronological sort means that
commits which were made three weeks ago, but which Linus only pulled
yesterday, do not show up at the top of the tree.

-- 
dwmw2

^ permalink raw reply

* Re: I'm missing isofs.h
From: Petr Baudis @ 2005-04-28  7:52 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Andrew Morton, git
In-Reply-To: <7vhdhra2sg.fsf@assigned-by-dhcp.cox.net>

Dear diary, on Thu, Apr 28, 2005 at 07:27:59AM CEST, I got a letter
where Junio C Hamano <junkio@cox.net> told me that...
> PB> Actually, I can't; the patch generator is not on par with mine yet.
> PB> It does not show modes and does not indicate file adds/removals by
> PB> /dev/null - basically, I need something cg-patch can eat (and it should
> PB> be backwards compatible). I think throwing the sha1 hashes away will not
> PB> harm; I got used to the Index: field and === marker, but I don't care if
> PB> I loose it.
> 
> I've looked at what cg-Xdiffdo does.  From the above paragraph,
> I sense that it does more than what cg-patch requires, so I took
> a look at cg-patch, too.  

Yes; that was what the last sentence was about. ;-)

> Can you help me verify if I understand the requirements cg-patch
> has on its input correctly?
> 
>  - Follow the convention of showing newly added files with
>    "--- /dev/null" and removed files with "+++ /dev/null";

Yes.

>  - Label matches this Perl regexp:
> 
>      m|^(---|\+\+\+)\s+[^/]+\/(\S+)\s+.*mode:([0-7]{3,}).*/|
> 
>    and you only care about sign ($1), filename ($2) and mode ($3).

Yes..

>  (modified files)
>  --- a/fs/ext3/Makefile  (mode:0644)
>  +++ b/fs/ext3/Makefile  (mode:0664)
> 
>  (deleted files)
>  --- a/fs/ext3/Makefile  (mode:0644)
>  +++ /dev/null
> 
>  (added files)
>  --- /dev/null
>  +++ b/fs/ext3/Makefile  (mode:0644)
> 
> Is my understanding correct?  If so it should not be too much
> work to generate something like it from within the builtin
> stuff.

Yes, perfectly.

> Provided if that is what the kernel folks can live with (I do
> see why the tool wants the mode bits, but it is unusual to see
> non-timestamp strings after filenames).

There's no reason not to get the timestamps too if you can; just put
them after the attributes. They aren't in the diff now either.

I need the mode bits to set the mode right, surprisingly. :-) Yes, in
part it is a leftover from the old times when we didn't just track the
execute bit; I don't know if it is worth changing this.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply

* Re: kernel.org now has gitweb installed
From: Petr Baudis @ 2005-04-28  8:10 UTC (permalink / raw)
  To: David Woodhouse; +Cc: H. Peter Anvin, Git Mailing List
In-Reply-To: <1114673723.12012.324.camel@baythorne.infradead.org>

Dear diary, on Thu, Apr 28, 2005 at 09:35:23AM CEST, I got a letter
where David Woodhouse <dwmw2@infradead.org> told me that...
> On Wed, 2005-04-27 at 18:38 -0700, H. Peter Anvin wrote:
> > http://www.kernel.org/git/
> 
> Looks like the ordering is wrong. A chronological sort means that
> commits which were made three weeks ago, but which Linus only pulled
> yesterday, do not show up at the top of the tree.

  Linus                     ASM (Anonymous Subsystem Maintainer)

    |------------------------.
   A|                        |B
    |                        |
    |                        \-------------\
    |                        :             |
    \------------------------\             |E
   C|                        |D            |
    |                        /-------------/
    |                        |F
    /------------------------/

How would you show that? F E D C B A? F D C A E B?

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply

* Re: Cogito nit: cg-update should default to "origin".
From: Dan Holmsand @ 2005-04-28  8:22 UTC (permalink / raw)
  To: git
In-Reply-To: <42705F3C.1000208@dwheeler.com>

David A. Wheeler wrote:
> So, how about this:
> 
> cg-revert [FILE...] or
> cg-revert [-d|--deleted]|[-a|--all]
>   Reverts some/all files back to the HEAD's state, eliminating changes

That's very good (and much better than my idea).

/dan

^ permalink raw reply

* Re: kernel.org now has gitweb installed
From: David Woodhouse @ 2005-04-28  8:29 UTC (permalink / raw)
  To: Petr Baudis; +Cc: H. Peter Anvin, Git Mailing List
In-Reply-To: <20050428081005.GG8612@pasky.ji.cz>

On Thu, 2005-04-28 at 10:10 +0200, Petr Baudis wrote:
>   Linus                     ASM (Anonymous Subsystem Maintainer)
> 
>     |------------------------.
>    A|                        |B
>     |                        |
>     |                        \-------------\
>     |                        :             |
>     \------------------------\             |E
>    C|                        |D            |
>     |                        /-------------/
>     |                        |F
>     /------------------------/
> 
> How would you show that? F E D C B A? F D C A E B?

Let us assume that C and A were already in Linus' tree (and on our web
page) yesterday. Thus, they should be last. The newly-pulled stuff
should be first -- FEDBCA.

I'd say "depth-first, remote parent first" but that would actually show
show 'A' (as a parent of D) long before it shows C. Walking of remote
parents should stop as soon as we hit a commit which was accessible
through a more local parent, rather than as soon as we hit a commit
which we've already printed. Maybe it should be something like depth-
first, local parent first, but _reversed_?

The latter is what the mailing list feeder does, but that has the
advantage of being about to use 'rev-tree $today ^$yesterday' so we
_know_ we're excluding the ones people have already seen. Hence I
haven't really paid that much attention to getting the order strictly
correct.

(Yes, I know that strictly speaking, git has no concept of 'remote' or
'local' parents. But the ordering of the two parents in a Cogito merge
or pull hasn't changed, has it?)

-- 
dwmw2

^ permalink raw reply

* Re: A shortcoming of the git repo format
From: Morgan Schweers @ 2005-04-28  8:31 UTC (permalink / raw)
  To: git; +Cc: Linus Torvalds
In-Reply-To: <Pine.LNX.4.58.0504271722260.18901@ppc970.osdl.org>

Greetings,

This is off topic, but this is a great paragraph, and an incredibly
concise and valuable lesson for pre-architect software developers.

On 4/27/05, Linus Torvalds <torvalds@osdl.org> wrote:

[...deletia...]

>    Doing development is a lot about communication. Writing code in many
>    ways is secondary - it's much more important to try to make sure that
>    everybody knows what the goals are, because the _real_ pain in
>    development ends up being not the coding, but the much more fundamental
>    disagreements that happen when people really have totally different
>    expectations of what the end result is going to be.

[...deletia...]

>                         Linus

--  Morgan Schweers

^ permalink raw reply

* Re: Finding file revisions
From: Simon Fowler @ 2005-04-28  8:41 UTC (permalink / raw)
  To: Chris Mason; +Cc: Linus Torvalds, git
In-Reply-To: <200504271831.47830.mason@suse.com>


[-- Attachment #1.1: Type: text/plain, Size: 1826 bytes --]

On Wed, Apr 27, 2005 at 06:31:47PM -0400, Chris Mason wrote:
> On Wednesday 27 April 2005 18:19, Linus Torvalds wrote:
> > On Wed, 27 Apr 2005, Chris Mason wrote:
> > > So, new prog attached.  New usage:
> > >
> > > file-changes [-c commit_id] [-s commit_id] file ...
> > >
> > > -c is the commit where you want to start searching
> > > -s is the commit where you want to stop searching
> >
> > Your script will do some funky stuff, because you incorrectly think that
> > the rev-list is sorted linearly. It's not. It's sorted in a rough
> > chronological order, but you really can't do the "last" vs "cur" thing
> > that you do, because two commits after each other in the rev-list listing
> > may well be from two totally different branches, so when you compare one
> > tree against the other, you're really doing something pretty nonsensical.
> 
> Aha, didn't realize that one.  Thanks, I'll rework things here.
> 
I've got a version of this written in C that I've been working on
for a bit - some example output:

+040000 tree    bfb75011c32589b282dd9c86621dadb0f0bb3866        ppc
+100644 blob    5ba4fc5259b063dab6417c142938d987ee894fc0        ppc/sha1.c
+100644 blob    c3c51aa4d487f2e85c02b0257c1f0b57d6158d76        ppc/sha1.h
+100644 blob    e85611a4ef0598f45911357d0d2f1fc354039de4        ppc/sha1ppc.S
commit b5af9107270171b79d46b099ee0b198e653f3a24->a6ef3518f9ac8a1c46a36c8d27173b1f73d839c4

You run it as:
find-changes commit_id file_prefix ...

The file_prefix is a path prefix to match - it's not as flexible as
regexes, but it shouldn't be too much less useful.

Simon

-- 
PGP public key Id 0x144A991C, or http://himi.org/stuff/himi.asc
(crappy) Homepage: http://himi.org
doe #237 (see http://www.lemuria.org/DeCSS) 
My DeCSS mirror: ftp://himi.org/pub/mirrors/css/ 

[-- Attachment #1.2: find-changes.diff --]
[-- Type: text/plain, Size: 8905 bytes --]

Find commits that changed files matching the prefix given on the command line.

Signed-off-by: Simon Fowler <simon@dreamcraft.com.au>
---

Index: Makefile
===================================================================
--- c3aa1e6b53cc59d5fbe261f3f859584904ae3a63/Makefile  (mode:100644 sha1:d73bea1cbb9451a89b03d6066bf2ed7fec32fd31)
+++ uncommitted/Makefile  (mode:100644)
@@ -38,7 +38,7 @@
 	cat-file fsck-cache checkout-cache diff-tree rev-tree show-files \
 	check-files ls-tree merge-base merge-cache unpack-file git-export \
 	diff-cache convert-cache http-pull rpush rpull rev-list git-mktag \
-	diff-tree-helper
+	diff-tree-helper find-changes
 
 SCRIPT=	commit-id tree-id parent-id cg-Xdiffdo cg-Xmergefile \
 	cg-add cg-admin-lsobj cg-cancel cg-clone cg-commit cg-diff \
Index: find-changes.c
===================================================================
--- /dev/null  (tree:c3aa1e6b53cc59d5fbe261f3f859584904ae3a63)
+++ uncommitted/find-changes.c  (mode:100644 sha1:64c0c3627d84969ee1596b05f97705455fba1871)
@@ -0,0 +1,279 @@
+/*
+ * find-changes.c - find the commits that changed a particular file.
+ */
+
+#include "cache.h"
+//#include "revision.h"
+#include "commit.h"
+#include <sys/param.h>
+
+/* 
+ * This is a simple tool that walks through the revisions cache and
+ * checks the parent-child diffs to see if they include the given
+ * filename. 
+ */
+
+static int recursive = 1;
+static int found = 0;
+
+static char *malloc_base(const char *base, const char *path, int pathlen)
+{
+	int baselen = strlen(base);
+	char *newbase = malloc(baselen + pathlen + 2);
+	memcpy(newbase, base, baselen);
+	memcpy(newbase + baselen, path, pathlen);
+	memcpy(newbase + baselen + pathlen, "/", 2);
+	return newbase;
+}
+
+static void update_tree_entry(void **bufp, unsigned long *sizep)
+{
+	void *buf = *bufp;
+	unsigned long size = *sizep;
+	int len = strlen(buf) + 1 + 20;
+
+	if (size < len)
+		die("corrupt tree file");
+	*bufp = buf + len;
+	*sizep = size - len;
+}
+
+static const unsigned char *extract(void *tree, unsigned long size, const char **pathp, unsigned int *modep)
+{
+	int len = strlen(tree)+1;
+	const unsigned char *sha1 = tree + len;
+	const char *path = strchr(tree, ' ');
+
+	if (!path || size < len + 20 || sscanf(tree, "%o", modep) != 1)
+		die("corrupt tree file");
+	*pathp = path+1;
+	return sha1;
+}
+
+static int check_file(void *tree, unsigned long size, const char *base, const char *target);
+
+/* A whole sub-tree went away or appeared */
+static int check_tree(void *tree, unsigned long size, const char *base, const char *target)
+{
+	int retval = 0;
+
+	while (size && !retval) {
+		retval = check_file(tree, size, base, target);
+		update_tree_entry(&tree, &size);
+	}
+	return retval;
+}
+
+/* A file entry went away or appeared.
+ * Check the entire subtree under this, and long_jmp() back to the parse_diffs()
+ * function if we find the target. */
+static int check_file(void *tree, unsigned long size, const char *base, const char *target)
+{
+	unsigned mode;
+	const char *path;
+	char full_path[MAXPATHLEN + 1];
+	int pathlen, retval;
+	const unsigned char *sha1 = extract(tree, size, &path, &mode);
+
+	pathlen = snprintf(full_path, MAXPATHLEN, "%s%s", base, path);
+	if (!cache_name_compare(full_path, pathlen, target, strlen(target)))
+		found = 1;
+
+	if (recursive && S_ISDIR(mode)) {
+		char type[20];
+		unsigned long size;
+		char *newbase = malloc_base(base, path, strlen(path));
+		void *tree;
+
+		tree = read_sha1_file(sha1, type, &size);
+		if (!tree || strcmp(type, "tree"))
+			die("corrupt tree sha %s", sha1_to_hex(sha1));
+
+		retval = check_tree(tree, size, newbase, target);
+		
+		free(tree);
+		free(newbase);
+		return retval;
+	}
+	return 0;
+}
+	
+static int diff_tree_sha1(const unsigned char *old, const unsigned char *new, const char *base, const char *target);
+
+/* the diff-tree algorithm depends on compare_tree_entry returning basically
+ * the same thing that memcmp() would on the filenames - this is important
+ * because the directories are sorted, and hence you need to decide what */
+static int compare_tree_entry(void *tree1, unsigned long size1, 
+			      void *tree2, unsigned long size2, 
+			      const char *base, const char *target)
+{
+	unsigned mode1, mode2;
+	const char *path1, *path2;
+	const unsigned char *sha1, *sha2;
+	int cmp, pathlen1, pathlen2;
+
+	if (found)
+		return 0;
+
+	sha1 = extract(tree1, size1, &path1, &mode1);
+	sha2 = extract(tree2, size2, &path2, &mode2);
+
+	pathlen1 = strlen(path1);
+	pathlen2 = strlen(path2);
+	cmp = cache_name_compare(path1, pathlen1, path2, pathlen2);
+	/* these files are different - if this is a directory then the
+	 * contents of the subtree are all different. So, we need to
+	 * run over the subtree and see if our target is in there
+	 * . . . */
+	if (cmp) {
+		check_file(tree1, size1, base, target);
+		check_file(tree2, size2, base, target);
+		return cmp;
+	}
+
+	if (!memcmp(sha1, sha2, 20) && mode1 == mode2)
+		return 0;
+
+	/*
+	 * If the filemode has changed to/from a directory from/to a regular
+	 * file, we need to consider it a remove and an add.
+	 */
+	if (S_ISDIR(mode1) != S_ISDIR(mode2)) {
+		check_file(tree1, size1, base, target);
+		check_file(tree2, size2, base, target);
+		return 0;
+	}
+
+	if (recursive && S_ISDIR(mode1)) {
+		int retval;
+		char *newbase = malloc_base(base, path1, pathlen1);
+		retval = diff_tree_sha1(sha1, sha2, newbase, target);
+		free(newbase);
+		return retval;
+	}
+	
+	check_file(tree1, size1, base, target);
+	check_file(tree2, size2, base, target);
+	return 0;
+}
+
+static int diff_tree(void *tree1, unsigned long size1, void *tree2, unsigned long size2, 
+		     const char *base, const char *target)
+{
+	while (size1 | size2) {
+		if (!size1) {
+			check_file(tree2, size2, base, target);
+			update_tree_entry(&tree2, &size2);
+			continue;
+		}
+		if (!size2) {
+			check_file(tree1, size1, base, target);
+			update_tree_entry(&tree1, &size1);
+			continue;
+		}
+		switch (compare_tree_entry(tree1, size1, tree2, size2, base, target)) {
+		case -1:
+			update_tree_entry(&tree1, &size1);
+			continue;
+		case 0:
+			update_tree_entry(&tree1, &size1);
+			/* Fallthrough */
+		case 1:
+			update_tree_entry(&tree2, &size2);
+			continue;
+		}
+		die("diff-tree: internal error");
+	}
+	return 0;
+}
+
+static int diff_tree_sha1(const unsigned char *old, const unsigned char *new, const char *base,
+			  const char *target)
+{
+	void *tree1, *tree2;
+	unsigned long size1, size2;
+	char type[20];
+	int retval;
+
+	tree1 = read_sha1_file(old, type, &size1);
+	if (!tree1 || strcmp(type, "tree"))
+		die("unable to read source tree %s", sha1_to_hex(old));
+	tree2 = read_sha1_file(new, type, &size2);
+	if (!tree2 || strcmp(type, "tree"))
+		die("unable to read destination tree %s", sha1_to_hex(new));
+	retval = diff_tree(tree1, size1, tree2, size2, base, target);
+	free(tree1);
+	free(tree2);
+	return retval;
+}
+
+static int process_diffs(struct commit *parent, struct commit *commit, const char *target)
+{
+	found = 0;
+	diff_tree_sha1(parent->tree->object.sha1, commit->tree->object.sha1, "", target);
+	if (found)
+		printf("%s\n", sha1_to_hex(commit->object.sha1));
+	return 0;
+}
+
+/*
+ * Walk the set of parents, and collect a list of the objects. 
+ */
+void process_commit(struct commit *item)
+{
+	struct commit_list *parents;
+
+	if (parse_commit(item))
+		die("unable to parse commit %s", sha1_to_hex(item->object.sha1));
+	
+	parents = item->parents;
+	while (parents) {
+		process_commit(parents->item);
+		parents = parents->next;
+	}
+}
+
+/*
+ * Usage: find-changes <parent-id> <filename>
+ *
+ * Note that this code will find the commits that change the given
+ * file in the set of commits that are parents of the one given on the
+ * command line.
+ */ 
+
+int main(int argc, char **argv)
+{
+	int i;
+	char sha1[20];
+	struct commit *orig;
+
+	if (argc != 3) 
+		usage("find-changes <parent-id> <filename>");
+		
+	get_sha1_hex(argv[1], sha1);
+	orig = lookup_commit(sha1);
+	process_commit(orig);
+	mark_reachable(&lookup_commit(argv[1])->object, 1);
+
+	/* this code needs to use tree.c to do most of the work - this
+	 * will simplify things a lot. 
+	 * XXX: rewrite diff-tree.c to do the same. */
+	
+	for (i = 0; i < nr_objs; i++) {
+		struct object *obj = objs[i];
+		struct commit *commit;
+		struct commit_list *p;
+
+		if (obj->type != commit_type)
+			continue;
+
+		commit = (struct commit *) obj;
+
+		p = commit->parents;
+		while (p) {
+			process_diffs(p->item, commit, argv[2]);
+			p = p->next;
+		}
+	}
+	return 0;
+}

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply

* Re: Git fork removal?
From: Petr Baudis @ 2005-04-28  9:10 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: git
In-Reply-To: <Pine.LNX.4.21.0504272221030.30848-100000@iabervon.org>

Dear diary, on Thu, Apr 28, 2005 at 04:47:24AM CEST, I got a letter
where Daniel Barkalow <barkalow@iabervon.org> told me that...
> > If this breaks your workflow, could you please describe it? Perhaps we
> > could find a good semantics to support both.
> 
> The part that I'm worried about is the way I turn a mass of debugging and
> little local commits into a clean patch series. I've got a working fork
> "barkalow", which is the result of a bunch of stuff and a dozen
> commits. It is derived from "linus". I want to split up the changes and
> make a series of commits, each of which will be a patch to submit.
> 
> 1) I fork "linus" into "for-linus". I go into "for-linus".
> 
> 2) I do "git diff this:barkalow > patch". This gives me the complete set
>    of changes I want to submit.
> 
> 3) I cut down the diff to a single logical change by removing all of the
>    other hunks.
> 
> 4) I do "git apply < patch". I do "git commit". I describe the logical
>    change.
> 
> 5) I go back to step 2, unless I'm done.
> 
> 6) For each of the commits between "linus" and "for-linus", I do 
>    "git patch <commit>", and send out the result.
> 
> The thing that I think requires the symlinks is step 2, which requires
> that there be somewhere I can run git and have it able to see a pair of
> unrelated local heads and the relevant trees.

Just do cg-pull barkalow, to get the latest changes from that repository
(perhaps clone should inherit branches information?).

But if you want Linus to pull from your tree, you generally want it to
be clean - that is, you want to manage clean separation (as Pavel Machek
describes in his document). That is another advantage of hardlinking -
you don't get any unrelated stuff in if you don't explicitly pull it, so
you can keep your for-linus branch clean. I'd do cg-diff linus:this in
the barkalow branch instead to keep this property.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply

* Re: kernel.org now has gitweb installed
From: David Woodhouse @ 2005-04-28  9:23 UTC (permalink / raw)
  To: Petr Baudis; +Cc: H. Peter Anvin, Git Mailing List
In-Reply-To: <1114676955.12012.346.camel@baythorne.infradead.org>

On Thu, 2005-04-28 at 09:29 +0100, David Woodhouse wrote:
> Let us assume that C and A were already in Linus' tree (and on our web
> page) yesterday. Thus, they should be last. The newly-pulled stuff
> should be first -- FEDBCA.
> 
> I'd say "depth-first, remote parent first" but that would actually show
> show 'A' (as a parent of D) long before it shows C. Walking of remote
> parents should stop as soon as we hit a commit which was accessible
> through a more local parent, rather than as soon as we hit a commit
> which we've already printed.

Walk the tree once. For each commit, count the number of _children_.
That's not hard -- each new commit you find below HEAD has one child to
start with, then you increment that figure by one each time you find
another path to the same commit.

When printing, you walk the tree depth-first, remote-parent-first. If
you hit a commit with multiple children, decrement its count by one. If
the count is still non-zero, ignore that commit (and its parents) and
continue. If the count _is_ zero, then this is the "most local" path to
the commit in question, so print it and continue to process its
parents...

(Actually I'd probably do it by adding real pointers to the children
instead of using a counter. Operations like convert-cache would be far
better off working that way round, and 'cg comments' is going to need to
do something very similar to convert-cache.)

-- 
dwmw2

^ permalink raw reply

* Re: kernel hacker's git howto
From: David Greaves @ 2005-04-28 10:22 UTC (permalink / raw)
  To: Pavel Machek; +Cc: kernel list, pasky, torvalds, Greg KH, GIT Mailing Lists
In-Reply-To: <20050428085657.GA30800@elf.ucw.cz>

I think a lot of people on the git list would like to see this - please 
CC :)

David

Pavel Machek wrote:

>Hi!
>
>Here's my current version of git HOWTO. I'd like your comments...
>
>	Kernel hacker's guide to git
>	~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>      2005 Pavel Machek <pavel@suse.cz>
>
>You can get cogito at http://www.kernel.org/pub/software/scm/cogito/
>. Compile it, and place it somewhere in $PATH. Then you can get kernel
>by running
>
>mkdir clean-cg; cd clean-cg
>cg-init rsync://rsync.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
>
>... Do cg-update origin to pickup latest changes from Linus. You can
>do cg-diff to see what changes you done in your local tree. cg-cancel
>will kill any such changes, and cg-commit will make them permanent.
>
>To get diff between your working tree and "next tree up", do cg-diff
>-r origin: . If you want to get the same diff but separated
>patch-by-patch, do cg-mkpatch origin: . If you want to pull changes
>from the "up" tree to your working tree, do cg-pull origin followed by
>cg-merge origin.
>
>
>How to set up your trees so that you can cooperate with linus
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
>What I did:
>
>Created clean-cg. Initialized straight from Linus (as above). Then I
>created "nice" tree, good for Linus to pull from 
>
>mkdir /data/l/linux-good; cd /data/l/linux-good
>cg-init /data/l/clean-cg
>
>and then my working tree, based on linux-good
>
>mkdir /data/l/linux-cg; cd /data/l/linux-cg
>cg-init /data/l/linux-good
>
>. I do my work in linux-cg. If someone sends me nice patch I should
>pass up, I apply it to linux-good with nice message and do
>
>cd /data/l/linux-cg; cg-pull origin; cg-merge origin
>
>  
>

-- 


^ permalink raw reply

* Re: Finding file revisions
From: Thomas Gleixner @ 2005-04-27 18:41 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Chris Mason, git
In-Reply-To: <Pine.LNX.4.58.0504271027460.18901@ppc970.osdl.org>

On Wed, 2005-04-27 at 10:34 -0700, Linus Torvalds wrote:

> > This will scale pretty badly as the tree grows, but 
> > I usually only want to search back a few months in the history.  So, it 
> > might make sense to limit the results by date or commit/tag.
> 
> With more history, "rev-list" should do basically the right thing: it will
> be constant-time for _recent_ commits, and it is linear time in how far
> back you want to go. Which seems quite reasonable.

Which is quite horrible, if you have a 500k+ blobs repo.

I know you are database allergic, but there a database is the correct
solution. Having stored all the relations of those file/tree/commit
blobs in a database it takes <20ms to have a list of all those file
blobs in historical order with some context information retrieved. Thats
not on a monster machine, its on an ordinary wallmart pc

tglx

^ permalink raw reply

* Re: [ANNOUNCE] gitkdiff 0.1
From: Ingo Molnar @ 2005-04-28 10:36 UTC (permalink / raw)
  To: Tejun Heo; +Cc: git
In-Reply-To: <4270711F.7020501@gmail.com>

* Tejun Heo <htejun@gmail.com> wrote:

>  Hello, guys.
> 
>  I've hacked tkdiff and made a commit viewing utility.  Just download
> the following tarball and unpack it whereever PATH points to.  It
> assumes that all base git executables are visible via PATH.
> 
>  http://home-tj.org/gitui/files/gitui-200504281405.tar.gz

very nice!

there's only one other utility i'm missing: a tool that does the 
equivalent of 'bk annotate' - and to possibly integrate it with gitkdiff 
and git-viz. That would make 'history browsing' very powerful: to 
flexibly switch between changeset history graph view, annotated file 
view and changeset history within one utility.

	Ingo

^ permalink raw reply

* Re: [ANNOUNCE] gitkdiff 0.1
From: Tejun Heo @ 2005-04-28 11:01 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: git
In-Reply-To: <20050428103655.GA14076@elte.hu>

Ingo Molnar wrote:
> * Tejun Heo <htejun@gmail.com> wrote:
> 
> 
>> Hello, guys.
>>
>> I've hacked tkdiff and made a commit viewing utility.  Just download
>>the following tarball and unpack it whereever PATH points to.  It
>>assumes that all base git executables are visible via PATH.
>>
>> http://home-tj.org/gitui/files/gitui-200504281405.tar.gz
> 
> 
> very nice!
> 
> there's only one other utility i'm missing: a tool that does the 
> equivalent of 'bk annotate' - and to possibly integrate it with gitkdiff 
> and git-viz. That would make 'history browsing' very powerful: to 
> flexibly switch between changeset history graph view, annotated file 
> view and changeset history within one utility.
> 
> 	Ingo

  Actually, I am thinking about making a full gui history thing.  With 
commit history graph, annoatated file history and all those stuff (I 
think it will look a lot like bk revtool).  Can't say how long it will 
take but maybe in a week.  So, if you have some ideas/suggestions, 
please let me know.

-- 
tejun

^ permalink raw reply

* Re: The criss-cross merge case
From: David Roundy @ 2005-04-28 11:16 UTC (permalink / raw)
  To: git
In-Reply-To: <Pine.LNX.4.21.0504272209390.30848-100000@iabervon.org>

On Wed, Apr 27, 2005 at 10:19:17PM -0400, Daniel Barkalow wrote:
> On Thu, 28 Apr 2005, Benedikt Schmidt wrote:
> > Ok, darcs doesn't handle block moves, so there is no need for an
> > algorithm that supports them (yet). Is there any free SCM that has
> > support for block moves at the moment? It seems like clearcase detects
> > them, but I don't know whqere it takes advantage of it.
> 
> I would think that darcs would be able to do neat things in its merger if
> it knew about block moves. Obviously, it only makes sense to add support
> for identifying them and using them at the same time.

Indeed, handling block moves would definitely be *very* nice.  An ancient
version of darcs actually did this (it's not in the current darcs history,
since it was so ancient and buggy), although it had a terrible diff
algorithm.  But I really didn't understand the theory back then, and when I
rewrote everything, I never added the block moves back in.  They complicate
conflict situations a bit, and once I found that darcs was actually
useable, I started focusing on other issues.  (Most recently efficiency
issues.)
-- 
David Roundy
http://www.darcs.net

^ permalink raw reply

* Re: [darcs-devel] Re: Darcs-git pulling from the Linux repo: a Linux VM question
From: David Roundy @ 2005-04-28 11:39 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Juliusz Chroboczek, Git Mailing List, darcs-devel
In-Reply-To: <Pine.LNX.4.58.0504270910510.18901@ppc970.osdl.org>

On Wed, Apr 27, 2005 at 09:16:03AM -0700, Linus Torvalds wrote:
> On Wed, 27 Apr 2005, Juliusz Chroboczek wrote:
> > Here we're speaking about the initial import.  Committed on 17 April
> > 2005 by Linus Torvalds, with the comment ``Let it rip''.  220 MB of
> > changed files in a single commit.  2 minutes real time just to read
> > all the files, never mind doing anything useful with them.
> 
> I think you may well want to consider the initial commit special. In many 
> ways it is - it has no parents etc, so even apart from the fact that the 
> initial commit obviously tends to be a lot bigger than any other commit, 
> it actually fundamnetally is _technically_ different too.

This has been discussed, and while I'm not opposed to special-casing the
initial commit, mostly we've taken the stance so far of not special-casing.
It's much nicer if we can make darcs efficient enough to perform the
initial commit without a special case, which has the nice side-effect of
also improving other cases.

When we're desperate, we'll special-case the initial commit, but currently
I'm sure we can pretty easily adjust things by making the git-tree-reading
lazy, which should pretty well address both the memory and speed
concerns--and also improve performance of other commands.  Perhaps more to
the point, it will also ensure that the same optimizations that work for
working with darcs repos will help when dealing with git repos.
-- 
David Roundy
http://www.darcs.net

^ permalink raw reply

* Re: Finding file revisions
From: Chris Mason @ 2005-04-28 11:45 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git
In-Reply-To: <Pine.LNX.4.58.0504271506290.18901@ppc970.osdl.org>

[-- Attachment #1: Type: text/plain, Size: 1884 bytes --]

On Wednesday 27 April 2005 18:19, Linus Torvalds wrote:
> On Wed, 27 Apr 2005, Chris Mason wrote:
> > So, new prog attached.  New usage:
> >
> > file-changes [-c commit_id] [-s commit_id] file ...
> >
> > -c is the commit where you want to start searching
> > -s is the commit where you want to stop searching
>
> Your script will do some funky stuff, because you incorrectly think that
> the rev-list is sorted linearly. It's not. It's sorted in a rough
> chronological order, but you really can't do the "last" vs "cur" thing
> that you do, because two commits after each other in the rev-list listing
> may well be from two totally different branches, so when you compare one
> tree against the other, you're really doing something pretty nonsensical.

One more rev that should work as you suggested Here's the example output 
from a cogito changeset with merges.  I print the diff-tree lines once for each 
matching parent and then print the commit once.  It's very primitive, but
hopefully some day someone will make a gui with happy clicky buttons
for changesets and filerevs.

diff-tree -r 2544d7558f0ce94ab9c163f5b67244f71d8c85b8 69eeae031bf5447e99b9274761e2361e8c5a944e
618fdb616cebbd2fc9f1cddc0b6b75fd575250a1->3579b5fd1182679a39b83eaaa9dd0e7c970f4545 diff-tree.c
diff-tree -r 9831d8f86095edde393e495d7a55cab9d35d5d05 69eeae031bf5447e99b9274761e2361e8c5a944e
2d2913b6b98ac836b43755b1304d2a838dad87dd->4f01bbbbb3fd0e53e9ce968f167b6dae68fcfa92 Makefile
cat-file commit 69eeae031bf5447e99b9274761e2361e8c5a944e
    tree 7510dc1b63e9e690ec73952e40a31e43af4b55bc
    parent 2544d7558f0ce94ab9c163f5b67244f71d8c85b8
    parent 9831d8f86095edde393e495d7a55cab9d35d5d05
    author Petr Baudis <pasky@ucw.cz> 1114544917 +0200
    committer Petr Baudis <xpasky@machine.sinus.cz> 1114544917 +0200

    Merge with rsync://www.kernel.org/pub/linux/kernel/people/torvalds/git.git

-chris

[-- Attachment #2: file-changes --]
[-- Type: application/x-perl, Size: 2385 bytes --]

^ permalink raw reply

* Re: Finding file revisions
From: Chris Mason @ 2005-04-28 11:56 UTC (permalink / raw)
  To: simon; +Cc: Linus Torvalds, git
In-Reply-To: <20050428084156.GK17682@himi.org>

On Thursday 28 April 2005 04:41, Simon Fowler wrote:
> I've got a version of this written in C that I've been working on
> for a bit - some example output:
>
> +040000 tree    bfb75011c32589b282dd9c86621dadb0f0bb3866        ppc
> +100644 blob    5ba4fc5259b063dab6417c142938d987ee894fc0        ppc/sha1.c
> +100644 blob    c3c51aa4d487f2e85c02b0257c1f0b57d6158d76        ppc/sha1.h
> +100644 blob    e85611a4ef0598f45911357d0d2f1fc354039de4       
> ppc/sha1ppc.S commit
> b5af9107270171b79d46b099ee0b198e653f3a24->a6ef3518f9ac8a1c46a36c8d27173b1f7
>3d839c4
>
> You run it as:
> find-changes commit_id file_prefix ...
>
> The file_prefix is a path prefix to match - it's not as flexible as
> regexes, but it shouldn't be too much less useful.

I dropped the regexes for speed with diff-tree, they weren't that important to 
me...The features I was going for are:

1) ability to see the changeset comments in the output.
2) ability to look for revs on more than one file at a time.  The single file 
limit in bk revtool always bugged me.
3) Some quick cut n' paste method to generate the changeset diff.  This is why 
I do diff-tree -r in the output, so I can just copy into a different window 
and go.

Your c version would hopefully end up faster on cpu time by limiting the 
number of times we read/decompress the commit files.

-chris

^ permalink raw reply

* RT[0/3]: Some related random thoughts
From: Kris Shannon @ 2005-04-28 12:59 UTC (permalink / raw)
  To: GIT Mailing List

I've had a number of thoughts about the "supposed" missing SCM features of git.

1) Alternate Encodings (including on-disk delta compression)
    If the default objects filename doesn't exist, we can try for
other alternative
    encodings e.g. 00/a29c403e751c2a2a61eb24fa2249c8956d1c80.xdelta which
    can specify the object content as a delta or other ingenious idea...

2) Rename/Code-movement Tracking (file and/or function)
    Additional object type tag(s) "rename" which references a changeset
    and lists the movement metadata

3) SHA1 backwards reference cache
    Allows quickly finding all commits which reference a given tree root,
    all/the "rename" for a given commit, all xdeltas which use this blob.

There a quite a few important issues with all 3 of these ideas so I
thought I would
elaborate each in separate emails... (coming soon :)

-- 
Kris Shannon <kris.shannon.kernel@gmail.com>

^ permalink raw reply

* Re: Finding file revisions
From: David Woodhouse @ 2005-04-28 13:01 UTC (permalink / raw)
  To: Chris Mason; +Cc: Linus Torvalds, git
In-Reply-To: <200504271423.37433.mason@suse.com>

[-- Attachment #1: Type: text/plain, Size: 2865 bytes --]

On Wed, 2005-04-27 at 14:23 -0400, Chris Mason wrote:
> Thanks.  I originally called diff-tree without the file list so that I could 
> do the regexp matching, but this is probably one of those features that will 
> never get used.

When I added this functionality to diff-tree I didn't want to add regexp
support, but I did make sure it could handle the simple case of "changes
within directory xxx/yyy". It can also take _multiple_ names. 

At the same time, I also posted a primitive script which attempted to do
something similar to what you're doing. The output of rev-tree is
useless, as Linus pointed out. Chronological sorting is
counterproductive in all cases and should be avoided _everywhere_.

My script is based on the original 'gitlog.sh' script, which walks the
commit tree from the head to its parents. It lists only those commits
where the file(s) in question actually changed, giving the commit ID and
the changes.

There's one problem with that already documented in my (attached) mail
-- we don't print merge changesets where the file in the child is
identical to the file in all the parents, but the changeset in question
_is_ relevant to the history because it's merging two branches on which
the file _independently_ changed.

The other problem is that we still don't have enough information to
piece together the full tree. With each commit we print, we're also
printing the last _relevant_ child (see $lastprinted in the script). 

That allows us to piece together most of the graph, but when we
eventually reach a commit which has already been processed (but not
necessarily _printed_, we just stop -- so we don't have useful parent
information for the oldset change in each branch and can't tie it back
to the point at which it branched. We know the _immediate_ parent, but
that parent isn't necessarily going to have been one of the commits we
actually printed.

I suspect the best way to do this is to start with a copy of rev-tree
and do something like..

	1. Add a 'struct commit_list children' to 'struct commit'

	2. Make process_commit() set it correctly:
@@ wherever @@ process_commit
	        while (parents) {
	                process_commit(parents->item->object.sha1);
+	                commit_list_insert(obj, &parents->item->children);
	                parents = parents->next;
	        }

	3. Check each 'interesting' commit to see if it affects the
	   file(s) in question.

	4. Prune the tree: For each commit which isn't a merge and which
	   doesn't touch the file(s), just dump it from the tree,
	   changing the child pointer of its parent and the parent
	   pointer of its child accordingly to maintain the tree.
	   For each merge where there are no changes to the file(s)
	   between the merge point and the point at which the branch was
	   taken, drop that too.

	5. Print the remaining commits.

-- 
dwmw2

[-- Attachment #2: Attached message - Re: [GIT PATCH] Selective diff-tree --]
[-- Type: message/rfc822, Size: 7296 bytes --]

[-- Attachment #2.1.1: Type: text/plain, Size: 2904 bytes --]

On Wed, 2005-04-13 at 14:57 +0100, David Woodhouse wrote:
> The plan is that this will also form the basis of a tool which will report the
> revision tree for a given file, which is why I really want to avoid the
> unnecessary recursion rather than just post-processing the output.

Script attached. Its output is something like this:

commit 97c9a63e76bf667c21f24a5cfa8172aff0dd1294 child
*100664->100644 blob    6e4064e920792d5b0219b9f8f55a38ab4a1af856->c1091cd15e2ed1be65b50eaa910f7b45c08d93ac      rev-tree.c

--------------------------
commit 13b6f29ac1686955e15f0250f796362460b4992e child 97c9a63e76bf667c21f24a5cfa8172aff0dd1294
*100644->100644 blob    5b3090780d49cc610339a19f070a5954dce9a8bc->c1091cd15e2ed1be65b50eaa910f7b45c08d93ac      rev-tree.c

--------------------------
commit 6420f0732f695269c0e3f28e62ed4b9aa6578d9f child 13b6f29ac1686955e15f0250f796362460b4992e
*100644->100644 blob    7429b9c4d0aab2e4a494eb4b65129a59da138106->5b3090780d49cc610339a19f070a5954dce9a8bc      rev-tree.c
*100664->100644 blob    28a980482bf2053e022409cc3e50b2ad8adafd55->5b3090780d49cc610339a19f070a5954dce9a8bc      rev-tree.c

 <...>

As we walk the tree from the HEAD to its parents, we print only those
commits which modify the file(s) in question. We remember the last
commit we printed as we recurse, so that we can generate a complete
graph. The SHA-1 of the blobs themselves aren't good enough on their own
because they're not guaranteed to be unique -- if the same change
happens on two different branches, the SHA-1 will be the same, and we
won't know how it fits together.

As it is, it's not quite perfect because I'm still omitting merge
commits where the resulting file is identical to the same file in _all_
of the parents. So if we have the following tree (for the _file):

       ----- (AB) ----,
      /                \ 
  (A) ------ (AB) ----- (AB) --,
      \                         \
       ----- (AC) --------------(ABC)

(Where the delta A->AB is a trivial one-line fix which two people
independently reproduce, then they merge their trees together)

.. the point where the two independent instances of (AB) are merged
together won't be shown in the output of the attached script. The output
would show only this:

       ----- (AB) ----,
      /                \ 
  (A) ------ (AB) ----- (ABC)
      \                /           
       ----- (AC) ----'

Do we care about this? Or is it good enough? I don't really want to emit
output for _every_ merge commit we traverse, just in _case_ it happens
to be relevant later. Should just give in to the voices in my head which
are telling me I should through the damn thing away and rewrite it in C?

Given this output, it should be possible to display a pretty graph of
the history of the file, and easily find both diffs and whole files.
Creating a graphical tool which does this is left as an exercise for the
reader.

-- 
dwmw2

[-- Attachment #2.1.2: gitfilelog.sh --]
[-- Type: application/x-shellscript, Size: 1983 bytes --]

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox